DOCUMENT RESUME i 



ED 230 559 
f 

AUTHOR 
TITLE 

INSTITUTION 



REPORT NO 
PUB DATE 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Kingston, Neal M.; 
The Feasibility of 
Psychometric Model 



TH 820 491 

Dorans, Neil J. 
Using Item Response Theory as 
for the GRE Aptitude Test. 
Educational Testing Service, Princeton , ^N. J . ; 
Graduate Record Examinations Board, Princeton, 
N.J. 

ETS-RR-82-12; GREB-79-12P 
Apr 82 

168p. ; Some tables may be marginally legible due 
small print. 

Reports - Research/Technical (143) 



to 



FffOl/PCO? Plus Postage. 

Aptitude Tests; ^Graduate Study; Higher Education; 
^Latent Trait Theory; ^Mathemat ical Models; 
Psychometrics; Standardized Tests; ^Statistical 
Analysis; ^Testing Programs; *Test Items 
^Graduate Record Examinations; Robustness; Three 
Parameter Model 



ABSTRACT 

The feasibility of using item response theory (IRT) 
as a psychometric model for the Graduate Record Examination (GRE) 
Aptitude Test was addressed by assessing the reasonableness of the 
assum]gtions of item response theory for GRE item types and examinee 
populations. Items from four forms and four administrations of the 
GRE Aptitude Test were calibrated using the three-paran\eter logistic 
item response model. Three equating methods w«re compared in this 
research: eguipercent ile equating, linear equating, and item response 
theory true score equating. Various data collect ion designs (for both 
IRT and non-IRT methods) and several item parameter linking 
procedures (for the IRT equatings) were employed. The IRT methods 
produced quantitative scaled score means and standard deviations that 
were higher and lower, respectively, than those produced by the 
linear and equipercent ile methods. The most notable finding in the 
analytical equatings was the sensitivity of the precalibrat ion design 
(used only for the IRT equating method) to practice effects on 
analytical items, particularly for the analysis of explanations item 
type. Since the precalibrat ion design is the data collection method 
most appealing (for administrative reasons) for equating the GRE 
Aptitude Test in a test disclosure environment, this sensitivity 
might present a problem for any equating method. (PN) 



i(i(ici(i(i(i(icicicicicici(icicicicicicicicicicieic^icieicicicieicicicicicicicicicicicic 

* Reproductions supplied by i:DRS are the best that can be made ^ 

* from the original document. ^ 

icici(i(icicicitici(itititititititititicicicicicicicicicicicicicicicicicicicicicicicicicicic 



ERLC 



I 



THE FEASIBILITY OF USING ITEM RESPONSE 
THEORY AS A PSYCHOMETRIC MODEL FOR 
THE CRE APTITUDE TEST ^ 



Neal M. Kingston 
and 

Neil J. Dorans 



u s DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 

F r)Uf.ATIf)NAl HFS(njHf,FS iNfOHMATION 
f.FNTfH iFHlf.' 

.f. »'lvMf} IfrifTi (if'fStiri rjr ( if 'J jn 1 I (nf i 

,,r.,,,r,.,t,-„, r 

M.r.Mf I r>,jr, (.»'»'ri 'T>,Jfl«' I" irTi[)r"V«- 
rf-pf'tflh- ''(jri 'lu Jlitv 

• P,Mf,»', ,,f ..r ,[.in,fjfis ■.f.lN'f] ,fi this '!rK u 



"PERMISSION to REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



(;RE Board Professional Report GREB No. 79-12P 
ETS Research Report 82-L2 



April 1982 



This report presents the findings of a 
research project funded b'y and carried 
out under the auspices of the Graduate 
Record Examina tions Board, 



TO THE EDUCATIONAL RESOURCES j 
INFORMATION CENTER (ERIC) ' 



ERIC 



X 



GRE BOARD RESEARCH REI»ORTS 
FOR GENERAL AUDIENCE 



Altman, K.^A. and Walliurk, M. M. A Suatnary 
of Data from the Graduat e Frograns and 
Admissions Manuaj_, ' GREB No, 74-lR, 
JanuaVy I97b. ^ 

baird, L. L. An Inventory of Documented 
Ai complibhments. GREB No. 77-JR, June 

BairJ, L. L. Ca operative Student Survey 
(Tht; graduates ($2,50 each}, and 
Careers and Curricula). GREB No. 
7ti-AK, March 1973, 

B.iird, I.. 1.. The Relationship Between 
Haliii>;s ot Graduate Departments and 
Faculty Publication Rates. GKEh No. 
77-2aR, November I98u. 

b.iird, L, L, and Knapp, J, E, The Inventory 
of DoiunJented Accomplishments for 
Graduate Admissions: Results of a 
Field Trial ^tudy of Its Reliability, 
Short-Term (^orrelares, and Evaluation, 
(.RKB .N'.. 7b- 3R, August 1981, 

hurtLS, K, I.. Graduate Admissions and 
Fellowship Selection Policies and 
Procedures (Part I and 11), GREB No, 
69^R, July 1970, 

( eiitra, J, A, How Universities Evaluate 
Fariilty Performance: A Survey 
ot Dep.irtnifrit Heads, GREB No, 7b-bbR, 
Julv 1977, i^^l.bu each) 

(Centra, J, A. Wdmen, Men and the Doctorafte, 
GHEB No. 7 1-1 OR, September 19 74, 
So each) 

t larK, M, J. The Absessment of i^uality in 
I Ph,D, Pru^rams; A Preliminary 
Repc^rt on sludgments by Graduate 
Ueans, GRKB No, 7^-7aR, October 
197^, 

( Irirk, M. J. PruKram Review Practices of 
I ni varsity Departments. GREB No, 
7')-)dR, July. 1 9 7 7 . {$1.0u each) 

iu',.,re, R, and MiPeek, M. A .Stvidy ol the 
Cunterjl of Three (^RK Advanced Tests, 
(.kl-.a No. 7H-'^R, HiiicU 19b^. 

IJ li.iti, 1. h . Anniitated BlblioKraphy ot 
lei,t Speededness. (;REB No, 76-9R, June 
19;>>. 

hi/i'i^'hrr. K. L. The New Detinitions ot Test 
Fairness In S 1 < t i on ; Developments 
and Impl iv at ionb . oREH No. 7i!-HR, May 

h'.rtna, K. Annotated Hi bll t)>^r aphy ol the 

Graduate Record Examinations. July 
19 79. 

h r ede r i k sen , N. and Ward, W. (. . Measures 
tor the Study o i (. r e a t i v i t y in 

S I J ft. t I t i f k' r >. 1 ^'ra - h 'i 1 vi n K • Hay 
1978. 

Hartriett, R. 1. sex Differences in the 
EnvXronmentb ot Graduate Students and 
Faculty. GRhB No. 7 7-2bR, March 
19H1 . 



Hartnett, R. T. The Information Needs of 
Prospective Graduate Students. GREB 
No. 77-8R, October 1979. 

Hartnett, R, T. and Willingham, W. W. The 
r Criterion Problem: What Measure of 
Success In Graduate Education? GREB 
No. 77-4R, March 1979. 

Knapp, J. and Hamilton, I.' B. The Effect of 
Nonstandard Undergraduate Assessment 
and Reporting Practices on the Graduate 
School Admiaslons Process. GREB No. 
76-lAR, July 1978. 

Lannholm, G. V. and Parry, M. E. Prugrams 
for Disadvantaged Students ii> Graduate 
Schools. GREB No. 69-lR, January 
1970. 

Miller, R. and Wild, C. L. Restructuring 
the Graduate Record Examinations 
Aptitude Test. GRE Board Technical 
Report, June 1979. 

Reilly, R. R. Critical Incidents ot 
Graduate Student Performance. 
GREB No. 70-5R, June I97^i. 

Roc»-, D. , Werta, C. An Analyaia of Tine 
Related Score Incremerita and/or Decre- 
ments for GRE Repeaters across Ability 
and Sex Groups. GREB. No. 7 7-9R, April 
1979. 

Rock, D. A. The Prediction of Doctorate 
Attainment in Psychology, Mathematics 
and Chemistry. GREB No. 69-baR, June 
197^1. 

Schrader, W. B. GRE Scores as Predictors of 
Career Achievement in History. GREB 
No. 76-lbR, November 1980. 

Schrader, W. B. Admissions Test Scores as 
Predictors of^Career Achievement in 
Psychology. GREB No. 76-laR, September 
1978. 

Swinton, S. S. and Powers, D. E. A Study 
of lyhe Effects of Ji>pecial Preparation 
on GRE Analytical Sxores and Item Types, 
GREB No, 78-2R. January 19b^. 

Wild, C, L, Summary of Research on 
Restructuring the Graduate Record 
Examinations Aptitude Test, February 
1979. 

Wild, C, L, and Durso, R. Etteit of 
Increased Test-Taking Time on lest 
Scores by Ethnic Group, AKe, and 
Sex, GREB No, 76-6R, June 1979. 

Wllaon, K, M, The GRE Cooperative Validity 
Studies Project. GREB No, 7S-BR, June 
1979, 

Wiltsey, R, G, Doctoral Use ot Foreign 
Languages: A Survey, GREB No, 70-14R, 
1972, (Highlights $,1,00, Part I >2,(nj, 
Part II $1,50), 

Witkin, H, A.; Moore, C. A'., OltmanVp. K. , 
Goodenough, D. P.; Friedman, F.; and 
Owen, D. R. A Longitudinal Study 
of the Role of Cognitive Styles in 
Academic Evolution puring the College 
Years. GREB No. 76-lOR, February 1977 
J ($5.00 each). 



THE FEASIBILITY OF USING ITEM RESPONSE THEORY 
AS A PSYCHOMETRIC MODEL FOR THE GRE APTITUDE TEST 



Neal M, Kingston 
and 

Neil J, Dorans 



GRE Board Professional Report GREB No. 79-12P 



April 1982 



Copyright ©1982 by Educational Testing Serviqe. All rights reserved. 



ERIC N^. 



ERIC 



Abstract 

The feasibility of using item response theory as a psychometric model 
for the GRE Aptitude Test was addressed by assessing the reasonableness of 
the assumptions of item response theory for GRE item types and examinee 
populations. Items from four forms and four administrations of the 
GRE Aptitude Test were calibrated using the three-parameter logistic 
item response model (one form was given at two administrations and one 
administration used two forms; the exact relationships between .forms and 
administrations are given in Test Forms and Populations section of this 
report). 



4 , The unidimensionality assumption of item response theory was addressed 

in a variety of ways. Previous factor analytic research on the GRE 
Aptitude Test was reviewed to assess the dimensionality of the test and to 
extract information pertinent to the construction of sets of homogeneous 
items. On the basis of this review, separate calibrations of discrete 
^ "verbal items and reading comprehension items were run, in addition to 
calibrations on all verbal items, because two strong dimensions on the 
verbal scale were identified in the factor analytic research. 

Local independence of item responses is a consequence of the unidimen- 
sionality assumption. To test the weik form of the local independence 
condition, partial correlations, both with and without a correction for 
guessing, among items with ability partialled out were computed and factor 
analyzed. Violations of local independence were observed in both verbal 
item types and quantitative item types. These violations were basically 
consistent with expectations based on the factor analytic review. 

Fit of the three-parameter logistic model to GRE Aptitude Test data 
was assessed by comparing estimated item-ability regressions, i.e., item 
response functions, with empirical item-ability regressions. The three 
parameter model fit all verbal item types reasonably well. .The fip to 
data interpretation items, regular math items, analytical reasoning items, 
and logical diagrams items also seemed acceptable. The model fit 
quantitative comparison items least well. The analysis of explanations 
item type was also not fit well by the three-parameter logistic model. 

The stability of item parameter estimates for different samples was 
assessed. Item difficulty estimates exnibited a large degree of stability, 
followed by item discrimination parameter estimates. The hard-to-estimate 
lower asymptote or pseudoguessing parameter exhibited the least temporal 
stability . 

The sensitivity of item parameter estimates to the lack of ubidimen- 
sionality that produced the local independence violations was examined. 
The discrete, verbal and all verbal calibrations of discrete verbal 
items produced more similiar estimates of item discrimination than the 
reading comprehension and all verbal calibrations of reading comprehension 
items, reflecting the larger correlations that overall verbal ability 
estimates had with discrete verbal ability estimates. As compared to item 



1 



ii 



discrimination estimates, item difficulty estimates exhibited much less 
sensitivity to homogeneity of item sets. The estimates of the lower 
asymptote were, for the most part, fairly robust to homogeneity of item 
calibration set. 

The comparability of ability estimates based on homogeneous item set^ 
(reading comprehension items or discrete verbal items) with estimates based 
on all verbal items was examined. Correlations among overall verbal 
ability estimates, discrete verbal ability estimates, and reading compre- 
hension ability estimates provided evidence for the existence of two 
distinct, highly correlated verbal abilities that can be combined to 
produce a composite ability that resembles the overall verbal ability 
defined by the calibration of all verbal items together. 

Three equating methods were compared in this research: equipercentile 
equating, linear equating, and item response thebry true score equating. 
Various data collection designs (for both IRT and non-IRT methods) and 
several item parameter linking procedures (for the IRT equatings) * were 
employed. The equipercentile and linear equatings of the verbal scales 
were more similar to each other than they were to the IRT equatings. The 
degree of similarity among the scaled score distributions produced by the 
various equating methods, data collection designs, and linking procedures 
was greater for the verbal equatings than for either the quantitative or 
analytical equatings. In almost every comparison, the IRT methods' 
produced quantitative scaled score means and standarcj deviations that were 
higher and lower, respectively, than those produced by the linear and 
equipercentile methods. The most notably finding in the analytical 
equatings was the sensitivity of the precalibration design (in this st^fdy, 
used only for the IRT equating method) to practice effects on analytical 
items, particularly for the analysis of ex|l>lanatiohs item type. Since the 
precalibration design is the data collection method most appfealing (for 
administrative reasons) for equating the GRE Aptitude Test in a test; 
disclosure environment, this sensitivity might present .a problem for any 
equating method. 

In sum, the item response theory model and IRT Jjrue score equating, 
using the precalibration data collection design, appear most applicable to 
the verbal section, less applicable to the quantitative section because of 
possible dimensionality problems with data interpretation items and 
instances of nonmontonicity for the quantitative comparison items, and 
least applicable to the analytical section because of severe practice 
effects associated with the analysis of explanations item type. Expected 
revisions of the analytical section, particularly the removal of the 
troublesome analysis of explanations item type, should enhance the fit and 
applicability of the three-parameter model to the analytical section. 
Planned revisions of the verbal section should not substantially affect the 
satisfactory fit of the model to verbal item types. The heterogeneous 
quantitative section might present problems for item'response theory. It 
must be remembered, however, that these same (and other) factors that 
affect tRT based equatings may also affect other equating methods. 



iii 



TABLE OF CONTENTS 

Page 



INTRODUCTION 1 

Assumptions of Item Response Theory 1 

Assessing the Reasonableness of the Assumptions • 3 

Review of pertinent factor analytic research. . . . « . 3 

Weak form of local independence . 3 

Item-abili ty regressions 3 

Comparisons based on homogeneous and heterogeneous 

subsets of items. • • 4 

Position or practice effect 4 

Robustness of IRT Equating to Violations of Assumptions. . . 5 

REVIEW OF FACTOR ANALYTIC RESEARCH 6 

Study 1 6 

Study II \ 1 

Study III 7 

Study IV 9 

Synthesis 9 

The difficulty factor problem 9 

Implications for GRE-IRT feasibility research lU 

TEST FORMS AND POPULATIONS , . . 14 

Test Forms • • • 14 

Populations 15 

PARAMETER ESTIMATION AND ITEM LINKING ' 18 

Item Calibration Procedures . . It^ 

Item Linking Hlan 18 

Item Linking Procedures • 23 

Results of Linking Test Forms. 26 

ASSESSING THE WEAK FORM OF LOCAL INDEPENDENCE: EXAMINATION OF PARTIAL 

CORRiJLATIONS AMONG GRE ITEMS CONTROLLING FOR EXAMINEE ABILITY. . .\ . 32 

Implications of Local Independence . . 32 

Analysis of Partial Correlation^ ^ .... 32 

Correction for guessing . 33 

Results for the Verbal Test 33' 

Factor ^.nalysis of partial correlations 33 

Results for the Quantitative Test . ' 38 

Factor analysis of pa^'tial correlations 39 

Summary and Synthes is 39 

Principal findings for the/Verbal Test 30 

Principal findings for the Quantitative Test. ...... 44 

Synthesis with previous factor analytic results .... 44 

ANALYSIS OF ITEM-ABILITY REGRESSIONS 45 




iv 



Page 



COMPARABILITY, SENSITIVITY, AND STABILITY OF PARAMETER ESTIMATES • • • J5 

Temporal Stability of Item Parameter Estimates 

Sensitivity of Item Parameter Estimates to 

Violations of Unidimensionality 

Comparability of Ability Estimates Based pn 

Homogeneous and Heterogeneous Sets Of Items. . . . . ^ . • o:? 

IRT EQUATING: COMPARABILITY WITH LiJJeAR AND EQUIPERCENTILE EQUATING. . 72 

Equating Methods • 

Equating Plan 

Judging the Adequacy of Equatings • 

Results. . 

Verbal equatings ****** 84 

Quantitative equatings * * * b4 

Analytical equatings 

_ , «... 0*4 

Discussion of equatings 

Verbal equatings * 

Quantitative equatings 

Analytical equatings ' 

Shifts iq dimensionality • 

. , .112 

SUMMARY AND DISCUSSION 

Summary p 

The basic assumptions of item response theory 

Implications of previous factor analytic research 

on GRE Aptitude Test 

Assessment of the weak form of local independence . . . J 

Analysis of item-ability regressions. . . • r 

Temporal stability of item parameter estimates 114 

Sensitivity of item parameter estimates to ^ 

violations of unidimensionality -^^^ 

Comparability of ability estimates based on 

homogeneous and heterogeneous sets of items li-> 

Equating comparisons 

lAo 

Synthesis 

Fit of item response theory model to the 

GRE Aptitude Test items and examinee populations. . .116 
Applicability of item response theory ^ 
equating methods * 

KB..»CBS ■ V 

APPENDIX A: Score Conversion Tables for Various -Equatings of the 
Verbal, Quantitative, and Analytical Scalea of 
Forms 3CGR1 , ZGRl, K-ZGR2 , and K-ZGR3 1 

APPENDIX B: Relative Efficiency Curves For Various Score Scales 
Produced by Different IRT Equating Methods on 
Forms 3CGR1, ZGRl , K-ZGR2 , and K-ZGR3 1^3 



ERIC 



0 



1 



INTRODUCTION 

The use of item response theory as a psychometric model tot the GRE 
Aptitude Test can provide a powerful set of statistical tools for analysis 
of items and tests, maintenance of score scales via equating, and development 
of better and more efficient test forms (Cowell, 19^79; Hambleton and Cook, 
1977; Hambleton, 1980; Lord, 1977, 1980a; Marco, 1977; and Warm, 1978). 
petennination of the applicability of IRT methods to the GRE Aptitude Test 
requires an assessment of the psychometric feasibility of using IRT as a 
mathematical model for item responses on the GRE Aptitude Test, Psychometric 
feasibility can be addressed by examining the r^sonableness and importance 
of the underlying assumptions of IRT for GRE populations and item types. 
The present research addresses the reasonableness of these assumptions and 
the robustness of IRT methods to violations of these assumptions. 



As sump t ions of Item Response Theory 

Item response theory provides a mathematical expression for the 
probability of success on an item as a function of a single characteristic 
of the individual answering the item, his or her ability, andTmultiple 
characteristics of the item. This mathematical expression is called an 
item response function. Both on psychometric grounds and for reasons of 
tractability , a reasonable mathematical form for the item response function 
of a multiple choice item is the three-parameter logistic model, 

1 - c 

(I) p (e) - c ^ ,^„„^, 

^ ^^"-T,7 a (e - b ) 
1 + e g g 



is the probability that an examinee with ability ^ answers 
item g correctly, 

is the base of the system of natural logaritlims approximately 
equal to 2,7183, 

is a measure of item discrimination tor item g, 
is a measut'e of item difticulty tor item g, and 

is the lower asymptote of the item response curve, the probability 
of very low ability examinees answering item g correctly. 

In equation (I), e is the ability parameter, a characteristic of the 
examinee, and a , b and c are item parameters that determine the sliape 
of the item res^ons^ function (see Figure 1), 



whe re 



p (e) 

8 



2 



Figure 1 




One of the major assumptions of IRT embodied in equation (1) is that 
the set of items under study is unidimensional , i.e., the probability of 
successful response by examinees to a set of items can be modelled with 
only one ability parameter, B. The second major assumption embodied in 
equation (1) is that the probability of successful performance on an item 
can be adequately described by the three-parameter logistic model. 

One consequence of the unidimens ionality assumption is the mathe- 
matical concept of local independence. There are two forms of local 
independence, weak and strong. The strong form can be stated as: 

n , 
(2) Prob (V « vie) « n P^(e)"g Q^(e) "g • where 

V is a vector random variable of binary responses (right or wrong) 
for the n items, 

V is a particular vector response pattern, 



e is the ability level/ 

u is an examinee's binary response to item g, either I or 0, 

g - . * 

P (9) is the probability of a correct response for an examinee of 

^ ability 0, 

Q (9) is 1 - P (9), the probability of an incorrect response for an 

^ examinee^of ability 9, and 

n is the number of items on the test. 

This form is equivalent to saying that, at each ability level, item, responses 
are statistically independent. The weak form of^ local independence states 
that at each 9, item responses are uncorrelated. 



Assessing the Reasonableness of the ^sumptions 

A major purpose of the present research is to assess Xhe reasonable- 
ness of the assumptions of IRT for GRE item types and populations. There 
is wide agreepient (Bejar, 1980; Hambletoti, Swamlnathan, Cook, Eignor, & 
Gifford, 1978; Lord, 1980a) Chat no single method exists for conclusively 
determining whether a set of responses to a set of items is unidintensional . 
Consequently, a variety of approaches were employed to assess the dimen- 
sionality assumption. 

Review of pertinent factor analytic research . ^•'pur factor analytic 
research studies conducted on the GRE Aptitude Test were reviewed in order 
to assess the dimensionality of the test and to extract information 
pertinent to the construction of sets of homogeneous items. These studies 
were also examined to extract hypotheses about the GRE Aptitud.e Test that 
could be tested at later stages of the research. ^ 

Weak form of local independence . As stated earlier, local independ- 
ence among Items is a mathematical consequence of the unidimensionality 
assumption. If responses to a set of items are unidimensional , these 
responses are statistically independent at a given level of ability. 
The local independence condition was tested by computing >^ ^ u» ^^^^ ^ 
tetrachoric correlation betv^een items g and h with estimated*© partialled 
out (Warm, 1978, p. lOl), for 6very pair of items in sets of apparently 
homogeneous items. These correlations were computed both with and without 
a correction for guessinig (Carroll, 19A3). tlie partial cor re iat ions , we re 
examined to identify items with large positive correlations. Tlie matrices 
of the partial correlations were then factor analyzed. Results of this 
semi-nonlinear factor analysis were compared with previous linear factor 
analytic results. Hypotheses were generated to explain these results. 

Item-ability regressions . Tlie item response function obtained from* 
the estimated item parameters can be viewed as an estimation of the 
theoretical form for the regression of item score ( 1 - a correqt response, 




4 



Q » an incorrect response) onto underlying ability. In other words, the 
item response function describes expected item performance as a function 
of ability. Actual item performance for a given estimated ability level 
was obtalrted from the data and plotted for various levels of ability to 
approximate an empir^^lcal item-ability regression (Hambleton, 1980; Stocking, 
1980). Visual inspection of how closely the estimated item-ability 
regression captured the empirical item-ability regression provided informa- 
tion about how well the three-parameter logistic model fit the data. 
Comparison of item-ability regressions for items calibrated in both 
homogeneous sets (e.g., all reading comprehension items) and heterogeneous 
sets (^-g., all verbal items) was ot particular interest. 

Comparisons based o^Jiomo^^ and heterogeneous subsets ot items .. 
In addition tTT visual inspect ion of the estimated and empirical item- 
ability regressions, examination of the comparability of item parameter 
estimates was used to assess the effects of heterogeneity on the tit of 
the logistic model. Correlations between item parameter estimates for the 
same Itc^ms calibrated in a homogeneous set artd in a heterogeneous set were 
computed to Index the degree of similiarity between the item-ability 
regressions. Mean di f fe rences "between item parameter, estimates also 
provided Intormlt^lon about the relative fit of the logistic model for sets 
of hoiaogi'neoub ffnd heterogeneous Items. 

Position or p riic^^t ic;e^ e 1 1 ec^t . The iin Id imens ionali ty assumption 
Impl les TiuiT Yhe oaly s'ys'tematlc Influences on Item performance are the 
Individual's ability and characteristics of the item. Given knowledge of 
an individual's ability, knowledge about that individual's performance on 
one item does not add any information tor forecasting that individual's 
pertorroance on another item. In other words, since ability and item 
i characteristics are the only systematic influences on item performance, 
knowledge ot that individual's performance on other items is superfluous. 
One practical consequence ot the un id ime ns ional i ty assumption is that item 
position should have no effect, on Item performance because, if item position 
atfected item performance, then something otiier than ability would be 
havfng a systematic effect on item performance. In short, if there is a 
position effect or practice effect on item performance, the unidimensional Ity 
assumption Is violated. In the present research, the same items appeared 
In two different locations on two forms of the GRK Aptitude test, enabling 
us to ascertain whether a position eft^'Ct existed. 

Practice effect, though a problem stemming from, data collection 
design, can have a major impact on the equating of test forms. Practice 
effect can occur when items appear in the second section' of the same item 
type. Also, a general effect, perhaps induced by fatigue, might occur on 
any items appearing late in a test. Any such systematic bias might not 
appear wlien the item was later used in another position in an operational 
section of the test, which would contribute to an incorrect equating. 
T\\Ui problem will exist (though not necessarily to the same extent) with 
any equating method that makes use of data collected in one portion of t^e 
test to equate scores based on a different {)ortion of the test. 



This report examines the impact of practice .effect on IRT : true score 
equating. Practice effects are analyzed in greater detail in/^nother 
Research report (Kingston & Dorans , 1982) . , * ? 

. ■ > '' . ■ ' * » / • ' 

Robustness of IRT Equating to Violations of Assumptions v - ^ 

Few mathematical models ever fit the data completely. The three- 
parameter* logistic model will not completely explain expected item 
pei^ormance on the GRE Aptitude Test any more than" ^...^a heavy point 
swinging without^riction on a weightless g^tring' (which) never existed 
in the real world, but at a certain stag6 of the process of knowledge 
is a very useful model of a pendulum" (Rasch, 1960)/ The various methods 
of assessing the fit of the model described in the section on reasonableness 
of IRT assumptions provided uS with knowledge about the degree to which 
the model fits the data, this knowledge is synthesized with the results 
of«the equating;s in the last sect ion of this report. 



REVIEW OF FACTOR ANALYTIC RESEARCH 

0 ' 

Four factor analytic research studies conducted on the GRE Aptitude 
Test were reviewed in order to assess the dimensionality of the test 
and' to extract information pertinent to the construction of sets of 
homogeneous items . The four studies are: 

I. Powe^rs, D. E., Swinton, S. S., 6 Carlson, A. B. A factor 
analytic study of the GRE Aptitude Test , GRE Board . 
Professional Report GREB No 75-llP, September 1977. 

II. Powers, D. E., Swinton, S. S., Thayer, D. , & Yates, A., 
A factor analytic investigation of seven experimental 
analytical item types , GRE Hoard Professional Report GREB 
No 7 7-lP, June I97b. . ^ ^ 

III. Swinton, S. , 6cJ^owers, D. E. A factor analytic study of 
the restructured GRE Aptitude Test , GRE Board Professional 
; ' Repo rt GREB No 77-6P, February 1980. 

IV. Rock, D. A., Werts, C . , 4« Grandy, J. Construct validity 

of the GRE across populations an empirical confirmatory 

study T Draft Report, 198U. 

?The first three studies involved factor analyses conducted at the item 
'level on interitem tetrachoric correlations; the third study also involved 

a factor analysis at the level of item parcels, i.e., items grouped together 
•on the b^sis of item difficulty and nominal item type, e.g. , analogies. 
. Jore-skog's (1978) confirmatory factor analysis model was used in the 

fourth study where the factoring was conducted on correlations among 

nominal item type parcels. 



Study I 

The stated purposes of the Powers, Swinton, and Carlson (1977) study 
were to determine the factor structure of the preanalytical GRE Aptitude 
Test and to determine the structure of several experimental tests by 
relating each of these tests to the structure of the operational GRE 
Aptitude Test. At that time, the operational GRE Aptitude Test was given 
in three separately timed sections: 



I. Discrete verbal (25 minutes) 

- analogies 

- antonyms* or opposites 

- sentence completions 

H. Reading comprehension (50 minutes) 

III. Quantitative (75 ifiinutes) 

- regular math 

- data interpretation 



(55 items) 
(18 items) 
(20 items) 
(17 items) 

(40 items) 

(55 items) 
(40 items) 
(15 items) 



ERLC 



14 



The experimental tests were composed of eith§r reading comprehension 
items, regular math items, data interpretation items, or quantitative 
comparison items, which at that time was an experimental item type. 

Powers et al, (1977) identified three global factors, one associated 
with each section of the test: general quantitative ability, general 
verbal ability or reading comprehension, and vocabulary or discrete 
verbal ability. In addition, they, identified smaller factors including 
a data interpretation factor, speed factors, and a technical reading i 
comprehension factor. 



t^d fi 



They used Dwyer (1937) extension analyses to extend factors from 
the space of the operational GRE Aptitude Test into the space of the 
experimental items» and then examined residuals. They found that t4ie 
quantitative comparison items were better explained by the general 
quantitative ability factor than were the data interpretation items 
already in the quantitative section. In addition, they found that 
the experimental scientific or technical reading qomprehension items 
were not well explained by the two global verbal ability factors of 
reading comprehension and vocabulary. 



Study II 

^ 

The stated purposes of the Powers, Swinton, Thayer, and Yates (1978) 
study were to assess, from a factor analytic point of view, the relation- 
ships between two preanalyt ical versions of the GRE Aptitude Test and 
seven experimental abstract reasoning or analytical item types and to 
replicate the factor structure uncovered by Powers et al. (1977). 

They identified three global factor^ on the operational GRE Aptitude 
Test: general quant ita^ve ability, rea/ting comprehension or connected 
discourse, and vocabulary or discrete verbal ability. -In addition they 
noted some smaller factbrs including a data interpretation factor, speed' 
factors on the verbal sections, and a specific content reading comprehen- 
sion factor. The results of Dwyer extension analyses of these operational 
factors into the space of each type of analytical item revealed that the 
logical diagrams and analytical reasoning items tended to load more on the 
quantitative factors than did the analysis of explana^t ions items, which 
appeared to be the most complex of these three types of analytical . 
items. 

Study III 

Since the GRE IRT feasibility research was conducted on the current 
restructured version of the GRE Aptitude Test, the recently completed 
Swinton and Powers (1980) factor analysis of the restructured GRE Aptitude 
Test is the most pertinent of the four factor analytic studies. Forms 
ZGRl and ZGR2, the first forms containing analtyical items on an opera- 
tional basis, were studied by Swinton and Powers to provide a factor 
analytic description of trie new restructured test and to compare this 



EKLC 



15 



structure ,td the factor structure of the former test. There are four 
separately timed aperational sections of the restructured GRE Aptitude 
Test: 



I. 


Verbal ability- (50 minutes) 

- discrete verbal 

- reading comprehension 


(80 
(55 
(25 


items) ' 

items) 

items) 




II. 


Quantitative ability (50 minutes) 

- quantitative comparison 

- data interpretation & regular math 


(55 
(30 
(25 


items) 
items) 
items) 


ft 


III. 


Analytical 'ability (25 minutes) 
- analysis of explanations 


(40 
(40 


items) 
items) 




IV. 


Analytical ability (25 minutes) 


(30 


items) 





- logical diagrams 

- analytical reasoning 

Both item level analyses and analyses based on item parcels were 
performed. First, Swinton and Powers (1980) factored analytical items 
alone and identified, after a varimax rotation (Kaiser, 1958), six factors: 
one logical diagrams factor, three analysis of explanations factors, a 
sj>eed factor, and an analytical reasoning factor. Visual inspection of a 
plot of eigenvalues from analytical item tetrachoric correlation matrices 
with communality estimates in the diagonal reveals that Swinton and 
Powers may have averf actored • On the basis of these plots, it appears 
that one, maybe two, factors would have been sufficient for the purpose 
of describing the major dimensions of the arflSlytical section. 

Next , ^ Swinton and Powers factored phe reduced tetrachoric correlation 
matrix for all items together and identified four major factors: reading 
comprehension or general verbal ability, vocabulary or discrete verbal 
ability, difficult quantitative and easy quantitative. In addition, they 
identified four smaller factors: a data interpretation factor, a technical 
reading comprehension factor, and two factors dealing with analytical 
items. Again, from visual inspection of the eigenvalue plots, it would 
appear that only four factors are needed to represent th6 important 
dime ns ions of the test . - 

On the basis of these item level analyses, item parcels were con- « 
structed using nominal item type,' item difficulty, and in some cases, 
e.g., the analysis of explanations items, item response key as facets. 

For example, the 20 antonym items were clustered Into five unique parcels ^ 
^composed of four items each and these five item parcels differed in 
difficulty. A total of 53 item parcels were constrttCted. The purpose 
of constructing item parcels is to avoid some of the problems associated 
with the factoring of binary data, such as the appearance of item difficulty 
factors and the ins tability of tetrachorics. Construe ting parcels that 
.differed in mean difficulty, however, may have defeated one purpose of 
constructing the parcels . 



r ' 

In their factor analysis of the 53 item parcels, Swinton and Powers' 
varimax factors were called verbal reasoni^ig, quantitative, and vocabulary, 
while the remaining three varimax factors were called technical reading 
comprehension, data interpretation, and analytical. The six oblimin 
(Jennrich & Sampson, 1966) factors were called easy items, quantitative, 
vocabulary , technical reading comprehension, data interpretation, and 
analytical. Easy items is obviously a difficulty factor. Finally, the 
geoplane (Yates, 1974) solution produced a reading comprehension and 
sentence completion factor, a general quantitative factor, a vocabulary 
factor, an analytical factor, a data interpretation and technical reading 
comprehension factor, and an easy quantitative factor. 



Study IV 

To assess the construct validity of the restructured GRE Aptitude 
Test, Rock, Werts, and Grc^ndy (iy8U) employed Joreskog's (197tt) confirma- 
tory factor analysis model to evaluate various psychometric models for the 
GRE Aptitude Test by testing progressively more restrictive hypotheses 
about the relationships between observed scores and underlying true scores 
or factor scores. Their analysis was performed at the level of nominal 
iteiti type; 2U scores were produced, odd-even half scores for each of 
the 10 nominal items types. ^ Since nominal item type score was the level 
of analysis, their report does not have direct implications for the 
evaluation of the dimensionality ot items. The report, however, is 
indirectly relevant. 

In particular, examination of the 2U-by-2U correlation matrix for 
these 20 odd-even item type scores is informative. The discrete verbal 
or vocabulary scores all correlate highly. The two reading comprehension, 
the two quantitative .comparisons , and the six analytical all correlate 
highly. The two data interpretation scores tend to have the lo^^st 
correlations with all other scores. 



Synt hesis 

The difficulty factor problem. Before synthesizing these four 
studies and discussing their implications for GRE IRT equating research, 
a brief discussion of the perils of using factor analytic techniques with 
binary data is appropriate. ^ 

The common factor model (Thurstone, 1947) is frequently employed to 
assess the dimensionality of a test or set of tests. It is a model that 
postuJ.ates a linear relationship between observed -attributes, such as 
those measured by tests , and underlying basic attributes or factors. 

The appearance of "difficulty factors" complicates the application 
of factor analytic techniques to binary data such multiple-choice 
items. The difficulty factor problem has long been recognized in the 
psychometric literature. McDonald (1967) presents a brief review of the 
difficulty factor literature, mentioning work by Guilford (1941), Ferguson 



1 



iO 

« 

(1941), Wherry and Gaylord (1944), Carroll (1945), Gourlay (1951), and 
Gibson (1959, 1960) among others. Guilford obtained a factor that was 
related to item difficulty in his analyses of the Seashore Test of Pitch 
Discriminations. Ferguson demonstrated that a matrix of phi coefficients 
for homogeneous items, i.e., items measuring the same ability, would have 
a rank greater than pne if items differed widely in difficulty. Wherry 
and Gaylord concluded that the appearance of Ferguson's difficulty factor 
was due to use of the wrong correlation coefficient and recommended use of 
the tetrachoric correlation for factoring binary data. both Carroll and 
Gourlay indicated conditions under which tetrachorics might yield a 
difficulty factor. Carroll demonstrated that, under guessing conditions, 
the obtained correlation of tests or items decreases as the tests or items 
become less similar in difficulty and that the obtained correlation 
between pairs of items decreases as their average difficulty becomes 
greater. Gibson claimed that difficulty factors can be considered cauded 
by the nonlinear regression of tests on factors. The point of this brief 
review is to demonstrate that difficulty factors are a problem to contend 
with when interpreting the results of factor analytic studies. 

Difficulty factors appeared in the Swinton and Powers (1980) factor 
analysis of the restructured GRE Aptitude Test. The varimax rotation of 
the unrotated factor matrix, obtained from factoring the reduced tetrachoric 
correlation matrix among all items, produced a difficult quantitative 
factor and an easy quantitative factor. On the basis of these results, 
the authors used item difficulty as a facet in the construction of item 
parcels. As a consequence, difficulty factors appeared in both the 
oblimin and geoplane solutions. The appearance of these difficulty 
factors complicates the interpretation of the results. For the purpose of 
constructing sets of homogeneous items for the present research, it seemed 
reasonable to ignore these difficulty factors since the three-parameter 
logistic IRT model allows for differential difficulty among items. 
* 

Implications for GRE IRT feasibility rensearch . Despite the interpre- 
tative complications induced by the appearance of difficulty factors, the**" 
Swinton and Powers study of the restructured GRE Aptitude Test had 'definite 
implications for the construction of sets of homogeneous items for the 
GRE IRT equating research.. Along with the other three factor analytic 
studies, this study provided strong evidence , for the existence of three 
large global factors: general quantitative ability, reading comprehension 
or general Verbal reasoning, and vocabulary. An obvious implication of 
this finding is that separation of reading comprehension items from other 
verbal items would produce two sets of items that are more homogeneous' 
than the original set of ^all verbal items. 

Swinton and Powers provided evidence for the mult idimensionali'ty 
of the analytical scale. They retained six factors for orthogonal rotation. 
While perhaps six factors are necessary to explain the bulk of the score 
variance, it is likely that only one or two of these factors represent 
major psychological dimensions. The factor atialysis^f the item parcels 
supports this parsimonious position because it produci^ a single analytical 
factor despite the fact that item response choice was one of the facets 
used in the construction of item pa reels • If there are two analytical 



factors, one is probably a quantitative factor and the other is probably a 
verbal factor. Examination of the rotated factor patterns revealed that 
the analytical items loaded highly on both the quantitative and verbal 
factors* 

/ The identification of a single small analytical factor in the factor 
analysis of item parcels suggested that separation of analytical item 
types into more homogeneous sets is unnecessary. On the other hand, the 
fact that the analytical items loaded highly on the quantitative and 
verbal factors, particularly reading comprehension, suggested that these 
items are complex. Unfortunately, the fact that the items load on both 
quantitative and verbal factors would have made it difficult to construct 
sets of more homogeneous items. In light of this difficulty and the fact 
that the composition of the analytical section was under revision, a 
decision was made to focus on the quantitative and verbal sections and to 
ignore the analytical section for the most part. 

The factor analysis review suggests the existence of a small data 
interpretation factor and a small technical reading comprehension factor, 
as well as speed factors,^ particularly in the verbal section. For the 
sake of homogeneity, separating data interpretation items from other 
quantitative items might have been a wise course of action. The same 
argument could be made for the technical reading comprehension items. 

It was decided, however, not to construct separate t^echnical reading 
comprehension and data interpretation scales as there would not have been 
enough items in the anchor tests to permit stable linkings of ability 
scales through item difficulty parameters. For example, it would have 
been necessary to use an anchor test containing 10 items to link the data 
interpretation scale for form ZGRl to the data interpretation scale for 
form 3CGR1. Outliers could have a large impact on the equation that links 
these two scales. If the guidelines for score equating pertain to linking 
of scale through IRT item difficulty parameters, the anchor test should 
contain a minimum of 20 items. 

Another reason for not constructing separate technical reading 
comprehension and data interpretation scales was the existance of a 
certain skepticism concerning the importance of these factors. Since both 
these factors are small, one or both might be tiny minor -factors (in the 
Tucker, Koopman, and Linn (1969) sense) that have been elevated to the 
level of common factors by ove rf actor ing . In the Tucker, KoopAan, and 
Linn model, a distinction is made between two systematic sources of 
covariation among observed scores: major factors and minor factors. 
Major factors are the common factors of the common factor model, systematic 
sources of covariation among observed scores that are viewed as important 
psychological dimensions.' In contrast, minor factors are systematic 
sources of covariation among observed scores thatalso exist in the data 
but are not a part of the common factor model. These minor factors, 
which influence performance, are not viewed as important dimensions but 
rather as nuisance components that negatively affect the fit of the factor 
model to the data. In an effort to describe all systematic covariation ^ 



ij 



12 



among item scores, Swinton and Powers (1980) may have extracted both major 
and minor factors,. 

Technical reading comprehension is possibly a form specific minor 
factor, dependent upon the unusualness of the particular vocabulary 
-employed in the technical reading passages. On the other hand, data 
interpretation could well be a unique form of quantitative ability. Both 
factors may ra^.se interesting questions for future restructuring of the 
GRE. For the jpurpo$e of the present research, however, the small numbers 
of both data interpretation and technical reading comprehension items 
precluded construction of separate scales for equating. 

The existence of these small minor factors must be kept in mind 
when comparing the results of IRT equating with conventional linear or 
equipercentiie equating. When confronted with two-dimensional data in 
which one dimension dominates the other, LOGIST is "drawn toward" the 
larger dimension as it progresses through its iterative 4)arameter 
es timation process ( Reckase , 1979) . Hence , tlie existence of a small 
data interpretation factor on the quantitative scale could introduce a 
discrepancy between IRT equating and conventional linear or equipercentiie 
equating of the quantitative . scale because of a differential effect of 
the data interpretation factor on these two equatings. The data inter- 
pretation factor will influence the direction of the quantitative true 
score dimension and the extent of this influence will depend upon the size 
of this factor. While LOGIST may ignore this factor and iterate toward a 
general quantitative dimension, conventional equatings will uo^e the intact 
true score dimension that is partially influenced by this minor factorl 
Hence, on 3: priori grounds we expected a discrepancy between conventional 
and IRT equatings due to this differential effect of the data interpretation 
factor. Inspection of the fit of the IRT model to the data interpretation 
items was expected to provide evidence pertaining to the reasonableness of 
ttiis hypothesis. 

The preceding discussion about the potential differential effect of 
the data interpretation factor on conventional and IRT equatings has 
implications for the potent ial effect of the small technical reading 
comprehension factor on the comparison of IRT and conventional equatings. 
This small factor could also induce a discrepancy between t.he conventional 
and IRT equatings. Here too, inspection of the fit of the IRT model to 
the technical reading comprehension items was expected to shed light on 
the reasonableness of this differential impact hypothesis. 

The speed component of the verbal section Is a nuisance f.actor that 
might complicate comparisons of the results of the IRT equating with 
the conventional linear or equipercentiie equating. - The speed component 
will influence^ the direction of the verbal true score dimension and 
consequently have an impact on conventional equating. For formula scored 
tests, such as the GRE Aptitude Test, the assumption that examinees will 
respond only to those items that they have reached is more tenable than it 
is for number-'r ight scored tests. To the extent that this assumption is 
reasonable, the convention we chose in estimating parameters with LOGIST, 



2u 



13 



coding all consecutively omitted items at the end of^ an examinee's answer 
sheet as not reached, should mitigate the impact of a speed component on 
the parameter estimates. Hence, a priori we expected a differential 
effect of speededness on the IRT and conventional equatings of the verbal 
scale. 

In sum, the four facLpr analytic investigations of the G\j^E Aptitude 
Test strongly suggest that separation of verbal itfems Into reading compre- 
hension items and vocabulary (discrete verbal) items would yield two sets 
of items that are more homogeneous than the single set of all verbal items. 
The studies also suggest that data interpretation items should be separated 
from other quantitative items and that technical reading comprehension 
items may define another distinct set. Doubts about the practical 
significance of these dimensions, coupled with the fact that there are 
too few items to permit stable linking of ability sc*ales through IRT 
difficulty parameters, led us to conclude that separate scales for 
data interpretation and technical readJLpg comprehension should not be 
established for the GRE IRT feasibility research. 



14 



TEST FORMS AND SAMPLES 

'Test Forms. 

Four operational forms of the GRE Aptitude Test were used in this 
study: ZGRl , K-ZGR2, K-ZGR3 and 3CGRK The three Z-forms are composed 
of four separately timed operational sections: 

Timing in 



Section " Item Type Minutes Number of Items 



I. Verbal 50 80 

discrete verbal 55 

reading comprehension 25 

II. Quantitative 50 55 

quantitative compa rison 30 

data interpretation & 

regular math 25 

« 

III. Analytical 25 40 

analysis of explanations 40 

IV. Analytical 25 ^ 30 

logical diagrams • 15 

analytical reasoning * 15 



The fifth section of each of the three Z-forms contained a 25-minute 
set of experimental pretest items. a' total of seven pretest ^sections were 
employed in this study to link the three Z-forms of the GRE Aptitude ^ 
Test. Table I contains pertinent information about these seven pretest 
farms: their pretest designation, item type, number of items, number 
of items used for linking. While the first three columns of Table 1 are 
self-explanatory, the fourth column .requires elabpratidn. ^ 

All pretest items are newly written items or revised items that 
appear in the test in order to develop item statistics for use in assembling 
operational test forms that have prespecified psychometric characteristics. 
For the purpose of this study, /these experimental sections provided the 
item parameter links between the three r-forms under study. For example, 
the items in pretests B41 and/B43 were used to linV^the verbal ability 
scales of the GRE Aptitude Ttfst\ (Further discussion of linking of IRT 
ability scales is deferred tb the section on linking of ability scales 
through itiem difficulty parameters. ) 

Since these pretest item^ wer6 being used for the purpose of linking 
IRT ab*ility scales, which is a prerequisite for IRT score equating of 
the three Z-forms, it was important to discard items with unacceptable 
psychometric characteristics. The numbers appearing in the fourth column 
of Table 1 are the numbers of items that survived the screening procedure 
for discarding items with unacceptable psychometric characteristics. 



2^ 



15 



The fourth operational form, 3CGR1 , is also composed of four separately 
timed operational sections: 

Timing in 

Section Item Type Minutes Number of Items 

I. Verbal 50 75 

discrete verbal 53 
reading comprehension , 22 

II. Quantitative 50 55 

quantitative comparisons ^ r 30^ 
data interpretation & 

regular math 25 

III. Analytical 25 36 

'analysis of explanations 36 

IV. Analytical "25 30 

logical diagrams 15 
analytical reasoning 15 

Form 3CGR1 was administered with six different 25-minute fifth sections. 
The items in these sections were not experimental pretest items. Instead, 
they were items taken from the four operational sections of form ZGRl . 
Table 2 lists the six fifth sections of 3CGR1 , the number of items in the 
section, and the section of ZGRl from which they were drawn. 

In addition to the seven pretest sections 'listed in Table 1, form 
ZGRl was. administered with six otiier section V's at the same adminis tratton 
at which 'form 3CGR1 was admini^stered with the six section V's listed in 
Table 2. Table 3 lists these six fifth sections of form ZGRl, indicating 
the number of items in the section, and the section of 3CGR1 from which 
they were drawn. 

Inspection of Tables. 2 and 3 reveals that each operational item from 
form ZGRl appears in one of the six section V's of form 3CGH1 and each 
operational item from 3CGR1 appears in one of thj^ six C-subfonas of form 
ZGRl. This commonality of it^ms was used to study position erfects. 



Samples * 

The various forms of the GRE Aptitude Te&t' used in this study were 
administered at four different times of year. Table 4 identifies the 
administration date at which each form was administered, and the sample 
sizes used in this research. Note that form ZGRl was administered 
twice: in February 1980 with the B-series of pretests that were sha^d 
with forms K-ZGR2 and K-ZGR3, and in June 1980 with the C-series of / 
section V's that contained operational items from form 3CGR1 . Form K-ZGR2 
was administered In December 1979 to a high ability population containing' 
scientifically oriented candidates competing for National Science Foundation 
fellowships (although the fellowship candidates made up only about 5 
percent of the December examinees, the potential effect of this group was 
considered important). 



ERIC 



2,> 



16 




Experimental Sections for Forms 
ZGRl, K*ZGR2 and K-ZGR3 



Number Number of Items 



Designation 


Item Type 


of Items 


Used foi Linking 


BAl 


Discrete Verbal 


55 


47 


B43 


Reading Comprehension 


25 


2U 


B46 


Quantitative Comparison 


K 40 


33 


B48 


Regular Math 


25 


23 


B50 


Data Interpretation 


16 


12 


B52 


Analysis of Explanations 


50 


39 


B53 


Logical Diagrams and 


16 


11 




Analytical Reasoning 


15 


11 



Table 2 



^ 

J 





Six Section 


V's for Form 


3CGR1 






Number 




Designation 


Item Type 


of Items 


Location in ZGRl 


CAl 


Ve rbal 


39 


Section I 


CA2 


Ve rbal 


41 


Section I 


CA3 


Quantitative 


27 


Section II 


CA4 


Quantitative 


28 


Section II 


CA5 


Analyt ical 


40 


Section III 


CA6 


Analyt ical 


30 


Section IV 






Table 3 . 






Six Section 


V's for Form 


ZGRl 






Number 




Designation 


Item Type 


of Items 


Location in 3CGR1 


C47 • 


Ve rbal 


37 


^ Section I 


C48 


Verbal 


38 


Section I 


'C49 


Quantitative 


27 


^ Section II 


C50 


Quantitative 


28 


Section II 


C51 


^ Analytical 


36 


Section III 


C52 


Analyt ical 


30 


Section IV 



2i 

ERIC 



17 



Description of Samples Used in this Research 



Administration 

Date Forms 



Expe rimental 
Section 



Sample Size 



Formula Score Means 
and Standard Deviations 
Experimental Operational * 



December 1979 



February 1980 



April 1980^ 



June 198U 



K-ZGR2 
K-ZGR2 
K-ZGR2 
K-ZGR2 
K-ZGR2 



BAl 
B43 
BA6 
BA8 
B50 



ZGRl 
ZGRl 
ZGRl 
ZGRl 
ZGRl 



BAl 
BA3 
BA6 
BAB 



B50 
K-ZGR3 



K-ZGR3 
K-ZGR3 
K-ZGR3 
K-ZGR3 



BAl 
BA3 
BA6 
BAB 
B50 



ZGRl 
ZGRl 
ZGRl 
ZGRl 
ZGRl 
ZGRl 



CA7 
CAB 
CA9 
C50 
05 1 



C52 
3CGR1 



3CGR1 
3CGR1 
3CGR1 
3CGR1 
3CGR1 



ai 

QAl 
CA3 
CAA 

a5 

CA6 



V 




9 9 91 
Z Z • ^ J 


Q 1 A 


IS m 


1 ^ .H4 


V 


2259 


5.96 


A. 02 


36.01 


15.71 


Q 


0 1 1 1 

Z J J J 


1 A 1 f\ 


7 


27.40 


10.93 


Q 


Z ZO 3 


1 A 17 


S ftQ 

J • 07 


26.88 


10.83 


Q 


1 1 O 

22dZ 


7 m 


9 U ^ 
Z • OD 


27.10 


10.58 


V 


Z ZOO 


9 Aft 


Q Aft 


32.03 


15.71 


V 


ZZU / 


^ 9Q 


1 H 7 
J • o / 


31. 8^ 


15.90 


Q 


2274 


13.72 


7.39 


24.63 


~9.88 


Q 


2216 


13.52 


5.55 


24. 8r 


9.97 


Q 


ZZJ 1 


A Aw 


9 HA 
Z • OO 


24.55 


9.93 


V 


> A 9 Q 

z^zy 


9 f 1 19 


Q IQ 


33.10 


H.61 


V 


Z^U D 


^ n 7 


1 Ql 


3^.08 


14.70 


Q 


z^zo 


11 19 
1 J • 1 Z 


7 S9 


"^5.19 


11.16 


Q 


zm ^ 


1 1 


S . 7S 


25.^3 


1 1.26 


Q 


2414 


6.56 


2.67 


24^>^11.22 


V 


2483 


13.23 


8.01 


31.61 


15.86 


V 


2486 


14.62 


8.10 


31.53 


16.30 


Q 


2498 


1 1.94 


6. A3 


24.46 


io.4;f 


Q 


2484 


12.88 


5.93 


24.26 


10.34 


A 


2488 


18.73 


9.13 


32.69 


15.21 


A 


2482 


14.14 


6.98 


32.69 


15.66 


V 


1489 


15.54 


8.A2 


30. 1-? 


1 5 . 38 


V 


1495 


15.91 


8.80 


30.43 


15.55 




1487 


1 1.65 


5.59 


24.94 


11.51 


Q 


1497 


12.27 


5. A3 


24.41 


11.74 


A 


1526 


24.26 


11.75 


28.86 


15.41 


A 


1476 


15.92 


7.19 


28.52 


14.87 



*Ope rational-formula raw scores are for the operational section corresponding to the 
pretest section listed in column three'. 



ERIC 



2rj 



18 



PARAMETER ESTIMATION AND ITEM LINKING 
Item Calibration Procedures 



/ 



Data from four administrations were^ijised in this research to assess 
the feasibility of using item response theory as a psychometric model 
for the GRE Aptitude Test. -A total of 10 different item types (see Table ^ 
5) were administered within each form. -All item parameter estimates and 
ability estimates were obtained with the program LOGIST, (Wood , Wingerfilcy 6i 
Lord, 1978). The function of LOGIST is to estimate, to^ each item, the 
^three item parameters of the three-parameter logistic mpdel: a (discrimi- 
nation), b (difficulty), and c (pseudoguess ing parameter); aSd , for 
each exAmiSee, 9 (ability). Th^ following constraints were imposed on 
the estimation process : a was restricted t;o values between 0.01 and 1.50 
inclusive, except for analytical item calibrations where the upper bound 
was 1.20; the lower limit for 6 was -7; and c was restricted to values 
between 0.0 and 0.5. Additionally, each examinee was required to have 
responded to at least 20 items in order to insure stable B estimates. 
Choosing appropriate constraints is a complex procedure, but necessary to 
speed convergence and produce stable estimates. 

For each administration, from four to six different item c^ilibrations 
were performed. Table 5 shows the relationship between the item types, 
calibrations and sections of the GRE Aptitude Test. Every item 'belongs to 
one item type, but may have been calibrated with more than one skt of 
items (e.g., every analogy item was calibrated with all verbal it«ms and 
with disc^j^te verbal items only), may have been calibrated more than 
once with the same set of items in the same relative positions (eJ^g., all * 
quantitative items on form ZGRl were calibrated twice, once when akiminis tered 
in February 1980 and onte when administered in June 1980), or may have 
been calibrated with the same set of items in different positions (e.g., 
every verbal item appearing in section I of form ZGRl also appeared in a 
section V of form 3CGR1). 



Item Linking Plan ^ ^ 

Any meaningful comparisons between item parameters or ability estimates 
require a common metric (Dorans, 1979). Consequently, the linking plan 
used to place the item and abili^ty estimates on a coounon scale is an 
important aspect of this research. Figures 2 through 5 depict the 
ite'm linking plans employed. The various verbal item linkings are portrayed 
in Figures 2 through A. Figure 2 displays the strategy used to link 
all the verbal item types. Each of the four test forms is repreaentcd by 
a rectangle attached to a square. The rectangle contains infonnation - 
about the operational section of the test: section number and the number 
of items. The square contains information about section V (experimental) 
of the test: subform designation and number of items. Test forms are 
ordered vertically by administration date. For example, test form ZGRl, 
administered in February 1980, is represented by the operational rectangle 
containing I, section number, and 80, number of items, connected to the 
experimental* test square containing B41 and B43, subform designations, and 



ERIC 



" Table 5 

■ 0 

Relationships Between Item Type, Calibrations, . 
dnd Sections of the. GRE Aptitude Test 



\ J 

Item Types 


Calibrations . 


Subsections 


Sections 


analc^gies . ' 
antonyms * • / ' 
sentence completions "^-^ 


■/ ■■ ' • ■ 

discrete /verb^il 


verbal 


» ♦ 


\ reading 

comprehension 


regular mathematics 

f 

data interpretation ' }\ 


1' 


quantitative 


quantitative comparison 


quantitativ(9 ^ ^ 
comparison 

\ — ^— 


analysis of explanations 
logical diagrams' 
analyt^ical reasoning 


/ 


analytical • 



27 



20 



Figure 2 > / 

IRT Linking Plan for Verbal Scales of GRE #titude Test 



AdministraCion 
Date 



Form 



ZGRl 



K-ZGR2 - / 



K-ZGR3 



3CGR1 



December 
i979 



February 
1980 



April 
1980 



June 
1980 



y I.; 








B41 


47 


B43 


20 




Oi»7 


37 








39 


Oi»8 


38 




e 


C42 


41 



23 



ERIC 



21 



47 and 20, number of linking items in pretests B41 and B43, respectively. 
Note in Figure 2 that F^rm ZGRl, administered in June 19tt0, is the base 
form, that forms K-ZGR2\rid K-ZGR3 are linked to the February 1980 adminis- 
tration of form ZGRl through pretest sections B41 and B43, that the February 
1980 administration of forSi ZGRl is linked ta<rrB-^ne 1980 administration 
by the 80 operational items\>i^8ect ion I, svl^ that f^m 3CGR1 is linked to , 
form ZGRl through spiralling attltfe^.4une 1^80 administration. 

As stated at the end of the factor analytic review, a decision was 
made to separate the reading comprehension items from\the discrete verbal 
items to establish distinct reading comprehension and/discrete verbal 
scales. Hence-, in addition to having been c^libratecr with All verbal 
items, each discrete verbal item was calibrated with discrete verbal items 
only and each reading comprehension item was calibrated with reading 
comprehension items only. After calibration, each discrete verbal item 
set was placed on its -parent verbal scale. These discrete verbal to 
verbal scale linkings are depicted in Figure 3, 

Each combination (test f orm/administ;rati6n date) is repr^esented by 
two rectangles in Figure 3: an all verbal rectangle and a discrete verbal 
rectangle. Each all verbal rectangle is partitioned into a thvee-by- three 
matrix. The first column of each .of these matrices contains a section 
designation. The second ajid third columns contain the number of reading 
comprehension items and the number of discrete verbal items respectively. 
For example, the matrix for the June 1980 administration of Form ZGRl 
indicates that Section I contained 25 reading comprehension items and 55 
discrete verbal items, the C47 experimental section contained 11 reading 
comprehenison items and 26 discrete verbal items, and the C48 experimental 
section contained 11 reading comprehension items and 27 discrete verbal 
items. 

For each all verbal rectangle there is a corresponding discrete 
verbal rectdtigle that contains the position of the information contained 
in the all verbal rectangle that defines the common item link, i.e., the 
section designation and the number of common discrete verbal items. The 
arrows in the figure define the direction of the various linkings, which 
all culminate at the ZGRl (6/80) all verbal rectangle. For example, the 
two ZGRl (2/80) rectangles indicate that the discrete verbal item and v 
ability parameters of ZGRl (2/80) were placed on the verbal base scale of 
form ZGRl (6/80) by a ZGRl (2/80) to ZGRl (6/80) all verbal linking via 
the 80 operational items of section I, and a ZGRl (6/80) to ZGRl -(6/80) 
discrete verbal to all verbal linking via the 55 discrete verbal items 
of section I and the 47 discrete verbal items of pretest B41. 

Figure 4 depicts the IRT linking plan for reading comprehension 
scales o^ the ORE Aptitude Test. It is similiar in format to Figure %^ 
Each reading comprehension rectangle contains the section designations 
and number of reading comprehension iteni^ used to place each reading 
comprehensjl-on scal'e on its parent verbal scale. , ' ~ 

Figure 5 depicts the IRT linking plan for the quantitative scales 
of the GRE Aptitude Test. It is similiar in format to Figuj:^^. 



23 




Figure 3 

for Discrete Verbal Scales of GRE Aptitude Test 



ERIC 



Form/Admin. 
Date 



3CGR1 

June 

1980 



ZGRl 
June 
1980 



ZGRl 

February 
1980 ' 



K-ZGR2 

December 

1979 



K-ZGR2 

April 

1980 



All Verbal 



Sec. 


RC 


DV 


I 


22 


53 


€41 


14 


25 


C42 


11 


30 



Lirrited by Spiralling 



Sec. RC DV 



25 55 



C47 
C48 



11 26 
11 27 



Sec. RC DV 



► I 


25 


55 




0 




I B41 


47 


..B43 


20 


0 


Sec. 


RC 


DV 


I 


25 


55 


B41 


0 


4/ 


343 


20 


0 


Sec. 


RC 


DV 


I. 


25 


55 



B41 0 ^7 1^ 
B43 20 0 



3U 



Discrete Verbal 



Sec. 


DV 


I 


53 


C41 


25 


C42 


30 ' 


Sec. 


DV 


I 


55 


C47 


26 


C48 


27 


Sec. 


DV 


I 


55 


- B41 


47 


Sec. 


DV 




55 


- B41 


47 


Sec. 


DV 


I 


55 


- 341 


47 



23 



Figure 4 

IRT Linking Plan for Reading Comprehension Scales of GRE Aptitude Test 



Form/Admin. 
Date 



30GR1 

June 

1980 



ZGRl 
June 
1980 



ZGRl 

February 
1980 



K-ZGR2 

December 

1979 



K-ZGR3 

April 

1980 



All 


Verbq, 




Sec. 


DV 


Nrc 


I 


53 




41 


25 ( 




42 


30 


11 



Reading Comprehension 



Linked by Spiralling 



Sec. 


DV 


RC 


I 


55 


25 


C47 


26 


11 


C48 


27 


11 



Sec. 


DV 


RC 


I 


55 


25 




0 ■ 




B43 

- 


20 


B41 


47 


0 


Sec. 


DV 


RC 


I 


55 


25 


B43 


0 


20 


B41 


47 


0 


Sec. 


DV 


RC 


I 


55 


25 


B43 


0 


20 


341 


47 


0 



Sec. 


RC 


I 


22 


41 


14 


42- 








Sec. 


RC ^ 


I 


25 


Chi 


11 


C48 


11 


Sec. 


RC 


I 


25 


B43 


20 


Sec. 


RC 


I 


25 


B43 


20 






Sec. 


RC 


I 


25 


B43 


20 



3i 



24 



Figure 5 

IRT Linking Plan for Quantitative Scales of GRE Aptitude Test 



A^ministrati.on 
Date 



Form 



ZGRl 



K-ZGR2 



K-ZGR3 



3CGR1 



December 
1979 



February 
I960 



April 
1980 



June 
1980 



■ II. 


55 






B46 


33 


B48 


23 


B50 


12 



II. 55 ^ 



II. 


55 








II. 


55 







B46 33 
B48 23 
B50 12 



-Linked by Spiralling- 



1,1. 55 



C49^ 


27 




43 


27 


C50 


28 




.44 


28 



3^ ' , 



25 



IRT Linking Procedures 

Two procedures were used to place Item parameter and ability estimates 
on the same metric: spiralling of test forms and a common Item linking 
procedure developed by Lord and Stocking (Petersen, Cook, 6 Stocking, 
1981). Spiralling of test forms at the June 1980 administration of the 
GRE "Aptitude Test was used to link parameter estimates on Form 3CGR1 to 
parameter estimates on the base form. Form ZGRl • The common Item linking 
procedure was used for all other Item llnklngs. 

Linking by spiralling assumes that alternating forms administered to 
examinees results In a random assignment of forms to examinees. Since large 
equivalent groups take each form, the distributions of ability In the two 
groups should be the same, and separate parameterlzatlons based on these 
two random groups via separate LOGIST runs should produce a single ability 
metric. 

The Lord-Stocking linking procedure produces robust estimates of 
location and scale of each distribution of item difficulties and an 
equation based on these robust estimates of location and scale. This 
equation is used to convert the parameter estimates of a set of items on 
one form from the arbitrary metric produced by the LOGIST calibration of 
those items on that, form to the base metric resulting f rpm the calibration 
of the June 1980 administration of Fortn ZGRl items. A step-by-step 
description of the linking of Form K-ZGR3 verbal items is used to Illustrate 
the procedure. (,_^^^ 

From Figure 2, we seeyt>i^t the K-ZGRi items are linked to the base 
form ZGRl, admlnlstei^ed in June 1980, vial two pathways. The first step in 
both pathways is to link tiie February 198^) administration of ZGRl verbal 
items to the June 1980 administration of ZGRl via the 80 shared items from 
section I. The end result of this procedure is the transf oruatlon of 
parameter estimates from the February 1980 administration of ZGRl to the 
base metric of the June 1980 administration of ZGRl. One pathway directly 
links K-ZGR3 to the transformed ZGRl (of 2/80) metric via the 67 shared 
items of pretest sections bAl and B43. The second pathway links K-ZGR3 to 
ZGRl through Form K-ZGR2. The first step in both pathways, the linking of 
the two ZGRl administrations, will be used to Illustrate the Lord-Stocking 
procedure. 

We^start v^ith two sets of item difficulty estimates, one from each 
administration of ZGRl. Each difficulty estimate is weighted by the 
reciprocal of its squared standard error of estimate; for each item, the 
larger estimate of its two standard errors of estimate (from the two 
estimates of item parameters) is used^ Then the means and standard 
deviations of these weighted item difficulty estimates are computed and 
used to obtain the conversion line that converts the. mean and standard 
deviation of the February 1980 estimates to the mean and standard deviation 
of the June 1980 estimates. At this point the process becomes iterative* 
The perpendicular distances of the item difficulty points from this 
conversion line are computed, and then biweights (Hosteller & Tukey, 
1977, p. 205) for these distances are obtained. These biweights are 



33 



then applied to the reweighted points and a new conversion line is produced* 
The distance y blweighty reweighting, and new conversion line cycle is 
repeated until the maximum change in perpendicular distance is less than 
dome criterion. The last conversion line produced by this process is then 
used to place the February 4980 items on the June 1980 metric. The 
results of the linking of the two administrations of verbal items appear 
in Table 6. The final conversion line has a slope of .9960 and an intercept 
of .0092. 

» 

Results of Linking Test Forms . 

Tables 6» 7, 8, and 9 contain the results o£ the liakings depicted in 
Figures 2» 3;^ 4 and 3, respectively. The verbal linking results are presented 
in Table 6. Perusal of this table reveals that, with the exception of 
Form K'-'ZGR2» the scale transforniatiotis produced only slight changes in 
location and scale; that Is, a atld 3 approach 1.0 and 0.0, respectively. 

The four weighted correlations in Table 6 arh all very high, as 
should be expected. Visual evidence of this can be seen in Figure 6, 
which is a scatter plot of difficulties for the 80 common verbal items 
used ^ to link the two ZGRl administrations. The noticeable outlier in this 
plot is item 78, which had a b of -.288 on ZGRl (6/bU> and a transformed 
difficulty of .729 on ZGRl (2/80). It should be noted that an outlier as 
extreme as this gets very little weight compared to the other data points. 
Except for this peculiar outlier, Figure 6 is typical of difficulty 
scatter plots for all four verbal linkings. 

,The review of factor analytic research on the GRE ^titude Test 
suggested separation of verbal items into mutually exclusive discrete 
verbal and reading comprehension sets. Table 7 contains the results, of 
the six discrete verbal linkings, which placed the discrete verbal items 
onto the metric of the verbal items after the latter had been transformed 
to the base metric of ZGRl (6/80). With the exception of the K-ZGR2 
transformation, only slight shifts in scale and location were required to 
convert the discrete verbal scales to thlg metric of their parent verbal 
scales* ^ . ^ 

Table 8 contains the results for the six reading comprehension 
linkings. A striking feature of Table 3 is the consistent large value for 
the intercept when scaling reading comprehensi'oa items^ to the verbal 
scale. (The K-ZGR2 intercept is somewhat larger than the other five.) 
This finding should not influence model fit or equating and is easily 
explained. The examinees whose responses were used ^to* estimate the item 
parameters in the reading comprehension calibrations were more able than 
those examinees whose responses w^re used in the verbal calibrations. 
This is due to our choice of a minimum number of items to which examinees 
must have responded in-order to be included in the calibration procedure. 
# 

Consider the reading comprehension calibrations for form K-ZGR2. 
Examinees who responded to fewer than 20 of the 45 reading comprehension 
items were dropped from the calibration, i.e., their item responses were 



' . . • .27 



Tablets 

Results of Verbal Item Linkings: Correlations 
(r) Between Weighted Difficulties; and Conversion 
.' Equation Parameters, Slope (a) and Intercept (6) 

"Old" Form "New" Form 



ZGRl(6/80) 




80 


• 

r- 


.9968 


ZGRl(2/80) 




a- 


.9960 


6- 


-0092 




ZGR1(2/80)T* 


n» 


67 


r- 


.9912 


- K-ZGR2(I2/79) 




a= 


.9401 


6- 


.1776 




ZGR1(2/80)T 


n- 


67 


r" 


.9942 


K-ZGR3(4/80) 






.9906 


B- 


-.0282 




K-ZGR2(12/79)T 


n» 


67 


r- 


.9873 


K-ZGR3(4/80) 




■ a- 


.9907 


6- 


-.0338 





Table 7 

Results of Discrete Verbal Item Lickings: 
Correlations (r) Between Weighted Difficulties; 
and Conversion Equation Parameters, Slope (a) 
and Intercept (8) 



Form 



ZGRl(6/80) 




108 


r- 


.9996 




a" 


1 .0005 


. 6- 


.0377 


ZGR1(2/80)T 


n" 


102 


r- 


.9995 




a= 


.9823 


6- 


.05^82 


K-ZGR'2(12/79)T 


n- 


102 


• r- 


.9994 




OLm 


.9215 


-6- 


.2406 


K-ZGR3(4/80)T1 


n" 


102 


r- 


.9997 




a- 


.9952 


B- 


.0032 


K-ZGR3(4/80)T2 


n- 


102 


r- 


.9997 




a- 


.9953 


6- 


-.0025 


3CGRl(6/80)" 


n- 


108 


r- 


.9996 






1.0143 


6- 


.0228 



*A T. suffixed to the "old" form designation indicates the trans format ton 

is to scale via an "old" form whose parameter estimates have already been 
transformed* 



28* 



Table 8 

Results of Reading Comprehendloa Linkings : 
Correlations (r) Between Weighted 
Difficulties; and Conversion Equation 
Parameters, Slope (a) and Intei^cept (g) 



Form 



ZGRl(6/80) 


n-47 
a-.9528 


r-.9965 
B-.1936 


ZGR1(2/80)T 


n-45 
a-. 9792 


r-.9968 
6-. 2250 


K-ZGR2(12/79)T - 


n-45 
CX-.9753 


r-.9973 
6-. 3333 


K-ZGR3(4/80)T1 


n-45 
a-. 9512 


r-.9947 
6-. 1726 


K-ZGR3(4/80)T2 


n-45 
a-. 9514 


r-.9947 
6-. 1670 


30GR1(6/80) 


n-47 
a-. 9594 


r-.9960 
6-. 1925 



"Old" Form 



ZGRl(6/80) 
ZGR1(2/80)T 
ZGR1(2/80)T , 
K-ZGR2(12/79)T 



Table 9 

Result* of Quantitative Linkings: 
Correlations (r)' Between Weighted 
Difficulties; and Conversion Equation 
Parameters, Slope (a) and Intercept (6) 



O 



n-55 


r-.9980 


a-. 9549 


6-. 03798 


n-68 


r-.9921 


a».9799 


6-. 2495 


n-68 


r-.9890 


a-. 9690 


6-. 0485 


n-68 


r-.9921 


a-.9860 


6-. 0477 


3G 





"New" Form 



ZGRl(2/80) 
K,-ZGR2(12/79) 
K.-ZGR3(4/80) 
K-ZGR3(4/80) " 




•f •■tlMltMi •r Itk ml aatfcl III. 




8 



> 



,31 



ignored .and no ability estimates were produced for them. For the verbal 
calibrations, examinees had -g^^gfond to at least 20 of the 145 to 147 
items to be retained in the analysis. On the average, approximately 600 
more examinees were dropped from the reading comprehension calibrations 
than were dropped from the verbal calibrations. Since these 600 examinees 
answered very few items, th^y were probably mostly examinees of very 
low ability. Since LOGIST uses an arbitrary ability metric having a mean 
of zero and a standard deviation of one, the item difficulty estimates 
obtained in the more able reading coinprehension group are lower than the 
♦estimates obtained when all verbal items were calibrated. Figure 8, which 
is a scatterplot of item difficulties for Form K-ZGR2, Illustrates this 
effect. Note that the conversion line for putting reading comprehension 
item difficulty estimates on the verbal item scale is essentially parallel 
to the main diagonal. This difference in item difficulty estimates 
reflects a true difference in ability in the two calibration grouil^. 

Figures 9, 10, and 11 contain typical scatterplots of the transformed- 
to-scale item discrimination estimates. Figure 9 depicts the relationship 
between the a's of all the verbal items common to both form K-ZGR2 and 
form ZGRl . Since each estimated a has been transformed to scale, the 
'point.s should fall along the true diagonal indicated in Figure 9. Though 
the scatter is greater than that on the plots of b estimates; there is no 
evidence of any systematic departure from the diagonal. Figure 10 depicts 
the relationship between the transformed-to-scale a's from the discrete 
verbal calibration of form^'K-ZGR2 with the transformed a's from the all 
verbal calibration of that form» Figure 11 shows the relationship between 
the lidding comprehension and trie all verbal a's. Note the preponderence 
of points to the left of the main difigonal. The discrimination parameter 
estimates for the 45 reading comprehension items are higher when calibrated 
alone than when- calibrated^-wi th the dis6rete verbal items, suggesting that 
• two different, though Tiighly correlated , scales are defined by the two 
different calibrations. Compare this with Figure 10 which shows a much 
smaller eftect for the discrete verbal linkings. 

Table 9 contains the results of the quantitative linkings . Examina- 
tion of this table yields an observation similiar to that produced by 
examining^ Table 6: With the exception of form k-ZGR2, thq difficulty 
parameter transformations produced only slight changes in location and 
scale. The sizeable shift in location for K-ZGR2 may be attributable 
to, the higher ability sample, containing National Science Foundation « 
fellowship candidates, at the December administratipn of the GRE i^titude 
Test. All four correlations in Table 9 are Very high. 



\ 

■4 



41 



32 



ASSESSING THE WEAK FQRM OF LOCAL INDEPENDENCE: 
EXAMINATION OF PARTIAL CORRELATIONS AMONG GRE ITEMS CONTROLLING FOR 

EKAMI NEE ABILITY 

Implications of Local Independence 

The strong form of local independence states that, for a given 
ability level, item responses are statistically independent. The weak 
form of local independence states that, for a given ability level, item 
responses are linearly independent, i.e., uncorrelated. If local independ- 
ence held and actual ability scores were available, then the partial 
correlations among items with ability partialled out would be zero. Since 
the responses to each item go into the ability estimates, however, slightly 
negative intercorrelations among the items are expected when these ability 
estimates are partialled out because of part-total contamination (Lord, 
1980b). 

Theta estimates were read from data sets created by previous LOGIST. 
runs while item responses were read from separate data sets and were 
recorded as 1 - correct and 0 « incorrect (incorrect responses included, 
therefore, omitted and not-reached items as well as incorrectly marked 
items). Biserials and point biserials with either verbal or quantitative 
ability estimates, and tetrachoric and partial correlations were calculated 
for items in the verbal and quantitative subtests for two GRE test forms. 
For each subtest, two runs were made: one with a correction for guessing 
(Carroll, 1945; Swinton, 1980) and one without this correction. The 
matrices of partial tetrachoric correlations were then factor analyzed. 
It was hoped that a linear factor analysis after first removing the 
variance due to the dominant (and nonlinearly derived) first factor 
would present a clearer picture than pr^ylous factor analytic studies. 



^alysis of Partial Correlations 

The partial correlations were examined to identify items that 
correlated highly among tftemselves (i.e., items that violated the assump- 
tion of the weak form of local independence). It was anticipated that an 
item would be more likely to correlate highly with an item of its own 
nominal type than it would with other items in the test and that items at 
the end of a speeded section would be highly intercorrelated. Moreover, 
it was expected that the percentage of high positive correlations among, 
items of the same type would be greater than the percentage of high 
correlations for all items in the test. These expectations were borne 
out, in some cases rather dramatically, and the results will be discussed 
in the following sections. 

The restilts from the administration of two GRE test forms, ZGRl 
(6/80) and K-ZGR2 (12/79), were examined. As previously stated, the 
latter form was administered to a sample of above average ability, and 
so some differences in the distributions of correlations were expected. 
The differences between distributions obtained from the two forms were, 



42 



33 



however y slight and nonsystematlc • Moreover, results from both forms 
tended to attest to high correlations among technical reading comprehension 
items on the verbal subtest and among 4^t:a interpretation items on the 
quantitative subtest, and aoxsie of the less marked results were also similar 
across test "forms. ' . ' 

Correction for guessing * As stated above, the correlations were 
obtained both with and without a correction for guessing* When a 
correction for guessing was made, ^■ itiitial .set;-^ of/ chance-level parameters 
(equal to *20 for the 80 JiVe-choice verbal Itefits^nd the 25 five-choice 
quantitative items and equal to *25 fx>r the 30 remaining four-choice 
quantitative item's) was used*. These initial estimates were adjusted 
downward, based on the data, for some items in order to avoid nonsingular 
correlation matrices. . Th;^ overall effect of the correction for guessing 
was to spread out the distribution of partial correlations* It was 
suspected that in some cases the proc^edure might have overcorrected for 
guessing since some partial correlations greater than 1*0 or less than 
-1.0 were obtained. Both tetrachorics and biserial correlations were 
corrected for guessing and the result on the partial "Correlations, which 
involve ratios containing both tetrachorics and biserials, may have be^n 
an overcorrection. Alternatively, these extreme correlations might simply 
be due to sampling error* In either case, the homogeneity of some nominal 
item types was more apparent after correct ion. for guessing* 



Results for the Verbal Subtest 

The 80 GRE verbal items were broken down into the following five, 
nominal item types for the purpose of this analysis: 

)■ 

^ Item type ^ Number of items . 

Sentence completions 17 

Analogies 18 

Antonyms 20 

, Reading comprehension 14 

Technical reading comprehension 11 

The first three item types (which comprise the class of discrete 
verbal items) occurred both at the beginning and at the end of the verbal 
test, While the reading comprehension and technical reading comprehension 
items were found in the middle of the test* The placement of the discrete 
verbal items introduces, therefore, the nuisance factor of speededness* 
Unusually high partial correlations were found ^mong the final 15 or so 
items in a separately timed section, regardless of their nominal item 
type* Certainly, this is in part a result of the fact that almost half 
the examinees did not reach the final items and that those who did attempt 
these items tended to get them correct* The speediedness factor, therefore, 
complicates the analysis, as does the large number of systematic' omissions 
for some reading passages. Both oJE these factors will be considered as we 
turn to the results for each nominal item type. 



43 



34 



Table 10 

Factor Pattern and Intercorrelatlons Among 
Kealdual Factors Extracted from Form ZGRl 
Verbal Item Correlation Matrix In which 
Overall Verbal Ability Estimates Have Been Partlalled Out 



Item Type 

Sentence Completion 



Analogies 



Item Position 




Factor I 


Factor II 


1 




-0.053 


-0.023 


2 




0.027 


-0.032 


3 




-0.079 


-0.009 


4 




0.063 


0.025 


5 




-0.269 


-0.108 


6 




-0.018 


-0.035 


7 


1 


0.035 


0.017 


8 




-0.031 


-0.063 


53 




-0.125 


-0.194 


54 




0.036 


O.Qll 


55 ■ ■ 




0.114 


-0.051 


^ 56 




-0.741 


-0.231 


57 




0.130 


-0.013 


58 




0.103 


0.012 / 


59 




-0.037 


-0.026 


60 




0.496 


-0.054 


61 




0.388 


0.032 


9 




-0.065 


-0.024 


10 




Q.031 


-0.100 


11 




0.539 


-0.094 


12 




-0.183 


-0 .302 


13 - 




0.286 


-0.143 


14 




0.093 


-0.061 


15 




0.309 


-6.055 


16 




0.194 


0.005 


17 




0.235 


-0.075 


62 . 




0.343 


0.012 


63 




-0.032 


-0.000 


64 




-0.158 


-0.059 


65 




-0.430 


-0.137 


66 




-0.186 


0.054 


67 




0.065 


0.100 


68 




0.031 


0.077 


69 




. 0.071 


0.050 


■ . 70 




-0.026 


0.034 



4'i 



Table 10 continued 



35 



Item Type Item Position 




Factor I 


Factor II 


Anton3nn8 18 






-a. 114 


0.07? 








-0.018 


0.078 








. 0.038 


0.136 


21 






0.011 


0.131 


22 






0.041 


,0.152 


23 






-0.148 


0.188. 


24 






.0.164 


0.220 


25 






0.045 


0.231 


26 






0.089 


0.202 


27 






-0.106 


0.699 


71 






-0.160 


0.837 


72 






-0.243 


0.909 


73 






' -0.147 


0.692 


7 A 






-U.UJj . 


n 77A 


75 






-0.150 


0.834 


/o 






0.018 


0.798 


/ / 






0.012 


0.668 


70 

7o 






A 0.277 


0.001 


70 

79 






0.278 . 


0.065 








0.4,20 


-0.011 


Reading Comprehension Zo 






0.258 


0.030 


29 






0.400 


-0.010 


30 






0.250 


0.048 


34 






0.393 


0.060 


35 






0.467 


0.010 


36 






0.402 


0.027 


37 






) 0,437 


0.022 






/ 






39 






0.480 


0.021 


40 






0.531 


-0.034 


41 






0.580 


0.02'3 


/. o 

42 






0.630 


0.024 


* /. o 

43 






0.571 


-0.046 


44 






0.476 


0.096 


Technical Reading Comprehension 31 






0.592 


-0 .011 








0.579' 


-0.022 








0.673 


0.008 


45 






0.671 


-0.081 


46 






0.657 


-0.022 


- 47 






0.699 


-0.088 


48 






0.568 


-0.017 


49 






0.608 


-0 •066 


50 








A AQA 
—0 .090 


51 






0«^86 


-0.057 


52 






0.588 


-0.046 








Factor I 


Factor II 


Factor 


I 




1.000 


.154 


Factor 


II 




.154 


1.000 



36 



Table 11 



Factor Pattern and Intercorrelations Among 
Residual Factors Extracted from Form K-ZGtl2 
Verbal Item Correlation Hatrix in which 
Overall Verbal Ability Estimates Have Been Partialled Out 



Item Type 

Sentence Comipletion 



Analogies 



Antonyms 



jLuem iros X u xon 


FACf'or T 


Factor II 


1 , 




" -0 OAl 


z 


—0 9 79 


-0 . 1 1 A 


J 


—0 IAS 


-0.170 


A 


-0.098 


0.044 


5 


-0.318 


-0.099 


0 


n 1 7A 


-0 03A 


7 


U « ZU / 


—0 0*^7 


Q 
O 


n AHA 


-0 007 


jJ 




-0. lOS 


J** 


— n lis 


-0 . 1 7 S 


DO 




-0.140 


DO 


—0 1 SQ 


-0.215 


<;7 
j/ 


—0 1 A9 


-0.062 


Do 


U . U OH 


-0.096 


<\Q 
J7 


U . \J\J J 


0.006 


ou 


0 A9Q 


-0.035 


01 


0 L0'\ 

U . H ^ J 


-0.086 

V . VO O 


Q 


—0 0 IQ 


-O.'OHO 


1 n 


-0 AAA 


-0.534 

V.J J*T 


1 1 
1 1 




-0.111 






-0.022 


13 


-0.165 


-0.230 


14 


0.385 


0.031 


ID 


0 97S 


0.016 


10 


0 OSA 


-0.05A 

V . V J o 


1 7 


V7 . J 7 


-0.133 


OZ 


V . t JO 


-0.062 


OJ 


0.037 


0.039 




-0.295 


-0.107 


65 


-0.277 


-0.044 


66 


-0.041 


0.386 


67 


-0.107 


0.420 


68 


-0.062 


0.527 


69 


-0.085 


0.680 


70 


-0.141 


0.643 


18 


-0.207 


0.737 


19 


-0.163 


0.70i 


20 


-0.065 


0.507 


21 


'0.133 


0.251 


22 


-0.015 


0.069 


23 


0.015 


0.234 



o 

ERIC 



46 



37 



Table 11 continued 



Item Type 
Antonyms 



Reading Comprehension 



Technical Reading Comprehension 



Position 


Factor I 


i^accor. xi 


24 


— U -U^f 0 


n 9 11 

U .Z 13 


25 


— U • 1 /U 




26 ^ 


0 •UU4 


U . 130 


27 


0»040 


131 


71 


A A 1 C 

0«M43 


A 1 19 

U. 13Z 


72 


-0.071 


0.109 


73 


0.098 


oao6 


74 


U .U J<l 


n 9A9 


75 


A A *1 A 

0.044 


A A9 Q 


76 


A A/t Q 

0 •04o 


A 

U .U30 


77 


A A A O 

O.OOo 


A A 9 1 

U .UZ3 


78 


A OQQ 


U .U lU 


79 


A 1 QL. 


"U .U I** 


80 


A O Q Q 


"U .uoo 


28 \ 


0.214 


_A Ail A 


29 \ 


A lie 

0 .213 


u .uu / 


30 . 


A A O T 

0.4o/ 


U.UlO 


39 


0 *2/U 


_n AAA 


40 


A A OA 

U .4Z<» 




41 


0.235 


-0.020 


42 


0.230 


-0.085 


43 


0.490 


•"U.U33 


44 


A /. c y. 

0.454 


A A Qfl 

— U.UoU 


45 


A C 7 1 

0.3/1 


_A AA 7 


46 


A 1 O <1 

0.2o2 


A 1 1 Q 
U . 1 l7 


50 


0.43o 


A A 9 A 


51 


A 7 A O 


—/I A 1 Q 
— U . U 1 7 


52 


A O Q 1 


U .UU 1 


• 

31 


t\ ceo 
0 .539 


n fiQA 


32 


A C to 

0.3/O 


•H; .U*»3 


33 


0.3O/ 


U .Ul 7 


34 


0.625 


-0.000 




0.611 


0.020 


36 


0.747 


0.099 


37 


0.627 


0.027 


38 


0.585 


-0.041 


47 


0.515 


0.044 


48 


0.666 


0.001 


49 


0.527 


0.002 




Factor I 


Factor II 


■Factor I 


1.000 


.066 


Factor II 


.066 


1.000 



47 



38 



Factor analysis of partial correlations > In an effort to summarize 
the results of the verbal Item partial correlation analyses, the partial 
correlations, not corrected for guessing, were subjected to factor analysis* 
(The choice of the uncorrected-f or^-'guesslng partlals was based on the 
difficulty of estimating communalltles using the corrected partlals, as 
well ad concern about overcorrection.) Principal factor analysis (Uarman, 
1976, Chapter 6.3) was used to Identify and extract the primary factors of 
these verbal partial correlation matrices. Since the dominant (nonllnearly 
derived) ability factor had been partlalled out, these remaining factors 
cat! be viewed as residual factors, that might be systematic sources of 
local Independence violations. Following extraction, these residual 
factors were rotated to an oblique solution using direct obllmln with 
Kaiser normalization (Harman, 1976; Chapter 14.4). 

The factor pattern (regression weights for predicting common portions 
of Item variables from underlying factors) and factor Intercorrelatlons, 
following a direct obllmln rotation of a two-factor solution for the Form 
ZGRl (6/80) verbal Item Intercorrelatlons with overall verbal ability 
partlalled out, appear In Table 10. Clearly, the first factor Is defined 
by the reading comprehension Items, primarily the technical reading 
comprehension items. The second factor appears to be a speed factor as 
the antonym items appearing at the end of the verbal section mark this 
factor. ^ — 

The cap4f68ponding results (factor pattern and Intercorrelatlons) for 
form K-ZGBC (12/79) appear in Table 11. Again, a two-factor solution was 
obtained y although the plot of eigenvalues suggested that a one-factor 
solu^tiart might have been sufficient. The first factor was clearly a 
readitlg comprehension factor marked by very high loadings for technical 
reading comprehension items in particular. The definition of tlie second 
factor is difficult. It appears to be a mixture of analogy and antonyms, 
but may well be a composite of noise components, i.ie., there may be only 
one meaningful residual factor, that marked by reading comprehension 
items. The relative high ability of the group that took Form K-ZGR2 may 
have caused the speed factor noted in the ZGRl analysis to dissipate. 

In sum, the JEactor analysis of partial correlation matrices with 
overall verbal ability partlalled out produced results consistent with 
the visual analysis of partial correlation distributions: evidence 
for both a technical reading comprehension factor and a nuisance speed 
factor. 



Results for the Quantitative Subtest 

The 33 quantitative items were broken down into the following 
three nominal item types: 

Item type Number of items 

Quantitative comparison 30 

Regular mathematics 15 

Data interpretation 10 



39 

The four-choice quantitative comparison items all appear at the beginning 
of the quantitative section, while regular mathematics and data interpreta- 
.tion items were interspersed in the latter part of the section. It was 
expected that speededness would prove to be less of a factor ft>r quantita- 
tive items than it was for verbal items since, in both test forms, at least 
80 percent of the examine^ reached item 30 out of 33 it^ms. 

Factor analysis of partial correliation8 > The quantitative partial 
correlation analyses were summarize by factor analyzing the partial 
correlations, not corrected for guessing, using principal factor analysis. 
Fad^ora remaining after the nonlinearly derived dominant quantitative 
f actotvhfii^iyeen partialled out can be viewed as residual factors^ that 
might be sysumatic sources of local independence violations. Following 
extraction, these residual factors were rotated to an oblique solution 
using direct oblimin with Kaiser normalization/ 

The factor pattern (regression weights for predicting common portions 
of the quantitative item variables from underlying factors) and factor 
intercorrelationa, following a direct oblimin rotation of a two factor 
solution for form ZGRl (6/80) quantitative item intercorrelations With 
overall quantitative ability partialled out, appear in Table l2. Both 
factors are marked by data interpretation items predominantly, suggesting 
that the two residual factors are different types of data interpretation 
factors. The corresponding results for form K-ZGR2 (12/79) appear in 
Table 13. Although two factors were extracted, a single-factor solution 
was probably suffici«tit. This first factor is clearly marked by the data 
interpretation items, while interpretation of the second factor is difficult 
since it is probably a composite of noise components. 



Summary and Synthesis 

Principal findings tor the verbal subtest . The analysis of partial 
correlations and the subsequent factor analysis for the verbal subtest 
uncovered two systematic sources of local independence violation. The 
reading comprehension items, particularly those pertaining to technical 
reading passages, retained positive intercorrelations even after overall 
verbal ability estimates were partialled out. Whether this reading 
comprehension residual factor is a special skill or simply a function oi^ 
the fact that sets of items refer to a common passage cannot be absolutely 
ascertained. Most likely, several influences are at work. In any case, 
the end result is a violation of local independence. 

The second systematic source, most evident in the analysis of form 
ZGRl,/ is speededness. Test speededness tends to enhance the partial 
correlations between items at the end of the test, probably because a 
self-selected group of higher ability examinees attempt them while those 
who do not reach them are of lower ability. This ability to perform well 
on speeded tests is probably related imperfectly to overall verbal ability. 
In other words, after overall verbal ability has been partialled out. 



43 



40 



Table 12 

Factor Pattern and Intercorrelatlona Among 
Residual Factors Extracted from Form ZGRl 
Quantitative Item Correlation Matrix In which 
Overall Quantitative Ability Estimates Have Been Partlalled Out 



Item Type 

Quantitative Comparisons 



Regular Mathematics 



Item Position 


Factor I 


Factor II 


1 • 


"•-0.643 


-0.094 


2 


•"-0.382 


-0.180 


3 


-0.318 


-0.077 


4 


^0.110 


-0.176 


5 


-0.190 


-0.050 


6 


-0.216 


-0.022 


7 


-0.216 


-0.039 


8 


U.038 


-0.124 


9 


-0.222 


-U.066 


10 


-0.023 


-0.141 


11 


0.121 


-0.048 


12 


-0.091 


-0.058 


13 


0.073 


-0.150 


14 


-0.064 


-0.096 


15 


0.261 ^ 


-0.050 


16 


-0.045 


-0.031 


17 


0.308 


0.035 


18 


0.023 


-0.162 


19 


0.224 


0.022 


20 


0.370 


-0.154 


21 


0.146 


-0.056 


22 


0.325 


-0.051 


23 


0.400 


-0.133 


24 


0.549 


-0.008 


25 


0.466 


-0.286 


26 


0.270 


-0.009 


27 


0.242 


0.006 


28 


0.228 


-0.057 


29 


0.147 


-0.024 , 


30 


0.122 


0.036 


31 


-0.241 


-0.040 


32 


-0.108 


0*184 


33 


-0.116 


-0.128 


34 , 


0.048 


-0.187 


35 


. . 0.317 


0.023 


40 - . 


-0.189 


0.250 


41 


-0.199 


0.290 


42 


0.083 


0.15B . 


43 


-0.103 


0.270 


44 


■ -0.026 


0.202 


51 


0.114 


0.066 ' 



\ 



00 



41 



Table 12 continued 



Item Type 


Item Position 


Factor I 


Factor 11 


Refitulac Mat hemA tics 


52 


0.526 


-0.075 




53 


0.514 


-0.059 


« 


54 


0.042 


0.155 




55 


-0.005 


0.761 


Data InterDretatlon 


36 


0*369 


0.567 




37 


-0.151 


0.905 


* 


38 


0.209 


0.462 




39 


0.040 ' 


0.623 




45 


0.156 


0.b68 




46 


0.254 


0.411 




47 




0 104 




48 


0.406 


0.194^ 




49 


o.6ia 


0.16^ 




50 


' 0.228 


.0.167 




Factor 1 
Factor II 


Factor I 

1.000 
.059 


Factor 11 

.059 
1.000 



i 



ERIC 



51 



42 



Table 13 

Factor Pattern and Intercorrelatlons Ainong 

Residual Factors" Extracted from Form K-ZGR2 
Quantitative Item Correlation Matrix in which 
Overall Quantitative Ability Estimates Have Been Partialled Out 



Item Type 

Quantitative Comparisons 



Regular Mathematics 



ERIC 



Position 


Factor I 


racuor ii 


1 






2 




-0 '^24 


3 


u • uo** 




* 


"•U •II'* 


— 'AMI 


3 


— n 1 7 




6 


*U • 1 u ^ 




/ 




-0 17 7 


o 


— n 1 7 A 

•I/O 


0 127 


9 




— n (iha 


10 




u • UU 1 


11 


/■i nA7 


-(1 1 9fi 

1/ • 1 «ii»0 


12 


1 HA 
1 uo 


—11 (lf)A 


13 




0 '157 


14 




0 132 




u • U 7 ^ 


0.U59 


16 


n HA '4 
uo J 


(1 (IQA 


17 


-0^009 


0^148 


18 


-U^Oll 


U^2b4 


1 O 

19 


— 09 S 
U • U 


0. 194 


2U 


"•\J • UO 1 


0 1 23 


21 


• UO O \ 


(1 ')A0 

U • .JO v 


22 


U^ U JO 


0- 188 
U • 1 oo 


23 


«f) m9 7 
"U • U^ / 


(1 112 


24 


U^ U 


0 110 

W • A A \/ 


2^ 


U • 1 A O 


0.003 


2o 


—fl HA 1 
""U •U** 1 


n 1 38 

• A .JO 


2/ 


U^ Uif 


0 'il5 

U • .J A ^ 


2o 


U«UU3 


0 977 






0.176 


30 


0.006 


0.405 


31 


-U.U22 


-0.142 


32 


-0.142 


-0.385 


33 


0.142 


-0.385 


34 


0.095 


-0.040 


35 


-0.050 


-0.331 


•42 


-0.068 


-0.179 


43 


0.181 


-0.263 


44 


-0.008 


0.050 


45 


0.121 


0.287 ' 


46 


0.086 


0.288 


51 


"«4.. Oi.l67 


0.212 



5^ 



43 



Table 13 continued 



Item Type 

Regular Mathematics 
Data Interpretation 




Item Position 


Factor I 


Factor II 


52 


0.020 


-0.057 


53 


0.218 


0.054 


54 


0.200 


0.333 


55 


0.291 


0.077 


36 


0.178 


-0.062 


37 


0.667 . 


-0.241 


38 


0.804 


-0.214 


39 


n fill 


— n %Qf< 
u . jyo 


40 


0.487 


0.091 


41 


0.447 


0.129 


47 


0.455 


0.274 


48 


0.381 


0.161 


49 


0.335 


0.220 


50 


0.314 


0.192 




Factor I 


Factor II 


Factor I 


1.000 


.129 


Factor II 


.129 


1.000 



i 



a residual speededness factor relnalns that systeoatlcally influences 
perfomance on items appearing at the end of the test. 

Prl.ncipal findings for the quantitative subtest> ,.„Ji:be analysis of 
partial correlations and the subsequent factor analyses tb€ the quantitative 
subtest uncovered a single major source of local independence violations: 
a factor influencing performance on data interpretation items. On form 
ZGRl (6/80)^ this source seemed to.be composed of two components that 
might be related to differences in data interpretation passages. On form 
K-ZGR2 (12/79), however, this separation into two components was not 
evident. In any case, the data interpretation items exhibited positive 
intercorrelations after general quantitative ability was partialled 
out. Whatever accounted for these positive correlations is a source of 
local independence violations. 

Synthesis with previous factor analytic results . The partial corre- 
lation analyses produced findings consistent with expectations, based on 
the factor analytic review described in Chapter J. The earlier factor 
analytic studies provided strong evidence for the existence of three large 
global factors in GRE Aptitude Test data: general quantitative ability, 
vocabulary or discrete verbal ability, and reading comprehension or 
general verbal reasoning ability. In addition, they provided evidence for 
the existence of some .smaller factors: technical reading comprehension, 
data Interpretation, and verbal speededness factors. The partial correla- 
tion analysis lust described produced evidence confirming some of these 
results, most notably results that would suggest violations of local 
independence. 



5i. 

C 



45 



ANALYSIS OF ITEM-ABILITY REGRESSIONS 

Frequently, researchers will try to assess the fit of a latent trait 
model to real data using a chi- Square test or other similar approaches 
(Wright, 1977). Unfortunately, such tests require expected values that 
are available Only when we know the values of item or people parameters; 
in the real world we only have estimates of these parameters. These 
estimates are likely to behave differently*^ from true parameters in a 
statistical test and would probably Increase the probability of a type II 
statistical- error; that is, we would not reject the null hypothesis that 
the model fits as frequently as we should. 

To avoid this problem, a graphical technique and some quantitative 
summaries of that technique were used in a roughly normative manner to 
assess the fit of the three-parameter logistic model. This exploratory 
technique, which will be referred to as analysis of Item-ability regressions 
compares the regression* of the observed proportion of people getting an 
item correct on estimated 9 (era^pirical regression) with the item 
response function based on the estimated it^m _p.arameters (estimated 
regression) (Hambletoh, 1980; Stocking, 1980). 

The untransforraed ability scale (0 estimated on the metric for 
which the trimmed calibration sample, examinees with estimated B between 
-3.0 and 3.0, has a mean of 0 and a standard deviation of 1) is split into 
15 intervals of width .4 in the range -3.0 to +3.0. P^, the proportion 
of people in interval i getting the item correct, adjusted for omits. Is 
computed for each in interval. That is. 



' • " n^ + n./A 

(3) = ^-^ , where 

n. 
1 

n^ is the number of examinees in the i-th 



i 



interval who got the item correct. 



n^ is the number of examinees in the i-th 

interval who omitted the item, 

A is the number of alternatives per item, 

n. is the number of examinees in interval i, 

who answered the item or any item 
subsequent to that item* 

The 15 P^ are plotted as squares whose areas are proportional to n^* 
For each interval, a line of length 4^(PQ/n^) is plotted, where P and Q 
are computed from the estimated item response function. The line is 
centered on the estimated response function. Although this line is a 
rough estimate of the .95 confidence interval around the item response 
functi9n, it is not being used as a statistical test. The reasons why 



46 



this line does not represent the .95 confidence interval include: the 
use of 2 Instead of 1.96 as a coefficient; the use of the inappropriate 
symmetric normal approximation to the binomial confidence interval around 
the response function (particularly a problem for extreme values of P); 
and the use of an interval based on estimated item parameters. 

Figures 12a through 12f show six examples of item-ability regressions » 
The vertical scale in each is the probability of a correct response 
and tanges from 0 to 1. The horizontal scale is the ability metric 
and ranges from -3.0 to +3.0. Various attributes of these item-ability 
regressions relate to model fit. After looking at more than 1,000 of 
these plots, we decided that a useful summary statistic would be the 
number of times the proportion of the exami nees i n an interval responding 
correctly to the item fell outside the + 2 ^ PQ/n interval centered 'on 
the response function: that is, the humbei: of times the midpoints of the 
boxes fell off the vertical lines. Ttius, the item-ability regressions in 
12a and 12b would each be scored 0, those in 12<; and 12d woqld be scored 2 
and 3. respectively, and those in 12e and 12f would be scored 5 and 9. 

This analysis is- based on 395 verbal, 275 quantitative, and 136 
) analytical items. The verbal and quantitative items consist of all such 
operational items from four administrations of the four GRE Aptitude Test 
forms studied in this research. The analytical items consist of all 
operational items from forms 3CGR1 and ZGRl . 

Table 14 presents cumulative distributions of it^m scores on the 
model fit statistic described above. Data are presented for the three 
major item classifications and their constituent item types. All data 
presented in this table are based on verbal, quantitative, ot analytical 
calibrations. ' ' 

To aid interpretation of these data, frequencies of-model fit score 
were collapsed into two categories (1, 2+) , and compared across item types 
with a chi-square test of independence. Table 15 presents these results 
for the three major item classifications. ' 



Figure 12 




Table 14 
Assessment of Model Pit 

.« 

I 

4 • 

Cumulative Proportion of Items 
with Model Fit Score Less Than or 
Equal tor 

Number 





Item Type 


of Items 


0 


1 


2 


3 


4 


5 


6 


7 


8 


All 

All 


verbal 






• o/ 


,yb 


QQ 

,yy 














^nadogies 


90 


.62 


.84 


.93 


.98 


1.00 












Antonyms 


102 


.67 


.91 


.97 


.99 


.99 


1.00 










Sentence Completions 


81 


.56 


.88 


.95 


1.00 














Reading Comprehension 'j 


122 


.66 


.86 


.97 


.99 


1.00 










All 


Quantitative 


275 


.45 


.69 


.82 


-.89 


.94 


.96 


.98 


.99 


l.QO 




Regular Mathematics 


75 


.45 


.80 




.95 


.96 


.9,6 


.97 


.97 


1.00 




Datk Interpretation 


55 


.56 


.80 


.90 


.91 


.98 


^ .98 


.98 


.98 


1.00 




Quantitative Comparison 


150 


.41 


.60 


.75 


.85 


.91 


.96 


.99 


.99 


1.00 


All 


Analytical 


136 


.59 


.82 


.95 


.98 


.99 


.99 


.99 


.99 


1.00 




Analysis of Explanations 


/76 


.54 


.76 


.93 


.96 


.97 


.97 


.97 


.99 


1.00 




Logical Diagrams 


30 


,70 


.97 


.97- 


1.-00 














Analytical Reasoning 


. 30 


.60 


.83 


.97 


1.00, 












All 


Items 


806 


' .56 


,80 


.91 


.96 


.98 


.99 


.99 


.99 


1.00 



DJ 



00 



ERIC 



, ^ . Table 15 

Comparison, of Model F'lt for Three Major 
Item Classifications 

Model Fit Score 



49. 



Item Classlf Icratlon 


0-1 


2+ 


Total 




V 




Verbal 


345 


50 


395 


• 2 
X 


m 


34.55 


Quantitative ' 


190 ^ 


85 


275' 


df 




2 " 


Analytical 


112 - 


24 


• L36 


P 


< 


. .0001 


Total 


647 


159 


806 

<• 









The high "x for Table, 15 shows a relationship between broad item 
classification and model fit. Whether or^^not the three-parameter logistic 
model fits any of the item types ia an absolute sense,. Table 15 shows that 
some fit more closely 'than others. In particular, the order of fit seems 
to "be (from best to worst) verbal, analytical, quantitative. Since these 
differences might be due to specific item types, each broad classification 
was separately analyzed by specific item type. Table 16 presents these - 
results tot verbal items. Table 17 for quantitative items, and Table 18 for 
analytical items. , 



I^ble 16 



Item Classification 



Analogies 
Antonyms 

Sentence Completion 
Reading Comp-rehension 
Total ' 



Comparison of Model Fit for 
Verbal Item Types ^ 



Model Fit Score 
0-i 2+ 



76 
93 
71 
105 
345 



14 
9 
10 
17 
50 



Total 
90 
102 
81 
122 
395 



X - 2.23 
df - 3 
p < .5267 



ERIC 



6vj 





Table 17 












Comparison of 
Quantitative 


Model Fit for 
Item Types 












^todel Fit 


Score 










Item Classification 


0-1 


2+ 


Total , 








Regular Mathematics 


60 


15 


75 


2 

X 


- 


12.77 


Data Interpretation 


40 


10 


50 


df 




2 


Quantitative Comparison 


90 


60 


150 


P 


< 


.0017 


To t al - 


190' 


85 


275 










Table' 18 


! 










Comparison of 
Malytlcal 

Model Fit 


Model Fit for 
Item Types 

Scor£ 






■ 




-Item Classification 


0-1 


2+ 


Total 








Analysis of Explanations 


58 




76 






6.16 


Log^lcal Diagrams 


\ 29 


1 


30 


df 




2 


Analytical Reasoning 


25 


5 


30 


P 

< 


< 


.0461 


Total 


112 • 


24 


136 









15 



51 



The four verbal Item types presented In Table 16 show no significant 
difference in model fit. Of the three quantitative item types presented 
in Table 17, the model fits the quantitative comparison items least well. 

One feature of quantitative comparison items is that they all share 
the same response options and instructions: 

Directions^ Each q,uestion in this part consists of two quantities, 
one in Column A and one in Column B. You are to compare the two 
quantities and on the answer sheet blacken space 

A if the quantity in Column A is the greater; 

B if tRe quantity in Column B is the greater; 

C if the two quantities are equal; 

D if the relationship cannot be determined 
from the information given. 

This might lead to mult idimensionality due to the particular correct 
response of the item. To investigate this, a chi-square test of independ- 
ence between the keyed response and model fit score (collapsed into two 
categories) was performed. Results are presented in Table 18. There is 
no evidence for any Response option factors. 



Table 19 

Comparison of Model Fit for Different 
Keyed Responses of Quantitative Comparisons Items 

Model Fit Score 

Keyed 



Response 


0-1 


2+ 


Total 




A 


23 


15 


38" 


-2.41 


B 


21 


19 


40 


df - 3 


C 


27 


12 


39 


p < .4923 


D 


19 


14 


33 




Total 


90 


\)&0 


150 





Alternatively, it could be argued that another type of mult idimension- 
^ality caused the model. fit problem. Perhaps quantitative comparison 
items themselves are unidimensional , but are tapping 4 different dimension 
from the rest of the quantitative items. Pactor analytic results, reviewed 
earlier in this report, did not indicate ^lis to be the case^ but the past: 
factor analytic studiM used linear models, and item tesponse theory is 
based on a nonlinear model. A separate quantitative comparison factor 
could not be ruled out. ' 



To further Investigate this, the quantitative comparison items for 
one form (K-ZGR3) were separately calibrated. Item-ability regressions^ 
for items in this calibration could not be affected by multldlmensionality 
Inherent across the three quantitative item types. Table piO compares the 
model fit for the 30 quantitative comparison items calibrated with the 
entire quantitative section witrh that for the items calibrated as a 
homogeneous subset* 

Table 20 

Comparison of Model Fit for Homogeneous 
and Heterogeneous Calibrations of Quantitative 
Comparison Items* 

Model Fit Score 

Calibration 0-1 24- Total 

Quantitative Comparison Only' 18 12 30 

All Quantitative Items 
Total 



19 

37 



II 

23 



30 
60 



Since different calibrations of identical items are represented in^ 
the two rows of Table 20» a test of Independence was not performed. 
Nonetheless, it seems obvious that any multidimensionality occurs within 
the item type and not across the three quantitative item types. 

Further examination of the items and their directions leads us to ^ 
hypothesize another type of dimensionality problem. Due to a problem 
solving r^ponse set, some examinees who did not know the answer to^ a 
quantitative comparison item might be more likely to answer D» **the 
relationship cannot be determined from the information given," than 
others of equal quantitative ability, in which case the poor model fit of 
these items might be explained. This problem solving response set would 
contribute to a lack of model fit, regardless of the keyed response. If 
the correct answer were A, B, or C, some examinees with a given ability 
would be less likely to pick the correct answer than others because of 
their propensity for response D. If D were the correct answer, these same 
examinees would be more likely to pick the correct answer than the model 
predicted. 

Table 18 Indicates that the three-parameter ^logistic model fits 
analysis of explanations items less well than the other analytical item 
type8« Like quantitative comparisons Items, these items all share a 
single response format: 



53 



Directions:' For each set of questions, a fact situation and a 
result are presented. Several numbered statements follow the 
result* Each statement is to be evaluated in relation to the 
fact situation' and result. 

Consider each statement separately from the other statements. 
For each one, examine the following sequence of decisions, in 
the order A,B,C,D,E. Each decision results in selecting or 
eliminating a choice. The first choice that cannot be eliminated 
is the correct answer. 

A Is the statement inconsistent with, or contradictory to, 

something in the fact situation, the result, or both 

together?. If so, choose A. 

If not, y ' 

B Does the statement present a .possible adequate explanation 

of the result? If^so, choose B. 

If not, 

C Does the statement haver to be true if the fact situation 
and result are as stated? 

If so, the statement is deducible from something in the 
fact situation, the result, or both together; choose C. 
If not, 

D Does the statement either support or weaken a possible 
explanation of the result? 

If so, tVie statement is relevant to an explanation; 
choose D. 

E If not, the statement is irrelevant to an explanation 
of the result ; choose E. 

Table 21 presents a test of independence between key*d response 
and model fit. 

Table 21 

Comparison of Model Fit for Different 
Keyed Responses of Analysis of Explanations Items 

Model Fit Score 



Keyed 
Response 


0-1 


2+ 


Total ' 








A 


10 


1 


11 


2 

X 




25.07 


■ B,. 


7 


10 


17 


df 




4 


C 


18 


1 


19 


P 


< 


.0001 


D 


16 


0 


16 








E 


7 


' 6 


13 








Total 


58 




76 


t 







ERIC . 6f 



Analysis of explanations items k^yed B or E were not fit well by 
the model. In fact, some of the B-keyed items are not monotonically 
increasing; more able students frequently choose the 6 response. Figure 
12f presents the most extreme example of such an item we have found. 
Factor analysis (Swinton & Powers, 1980) has provided additional evidence 
of keyed response specific factors for analysis of explanations items. 

In summary, the three-parameter logistic model seems to fit all of 
the verbal item types and two of the analytical item types, logical 
diagrams and analytical reasoning, better than the three quantitative item 
types and the analysis of expla^mtions^ items. Of the latter four item 
types, regular mathematics and data interpretation items seem to be fit 
almost as well as some of the "good fitting" item types. Analysis of 
explanations items keyed other than B or E were' fit by the model quite 
well (less than 5 percent of the items keyed A, C, or D have a model fit 
score of 2 or greater), but those keyed B or E have the highest proportion 
of model fit scores of 2 or greater of any of the item classifications we 
considered (53%). Quantitative comparison items were the most difficult 
item type for the three^parameter logistic model to fit. 



t 



55 

COMPARABILITY, SENSITIVITY, AND STAMLITY OF PARAMETER ESTIMATES 

Temporal Stability of Item Parameter Estimates 

The operational sections of form ZGRl were administered twice, once 
In February and once In June 1980, which allows us to assess the temporal 
stability of Item parameter estimates.* Theor^etically , the Item response 
function for each Item should not be affected by wheiv the Item was adminis- 
tered, provided that a common metric has been established. The section on 
parameter estimation and Item linking^ describes the procedure, used to 
place all Item parameter estimates on the same scale. Thus, any discrepan- 
cies In Item parameter estimates should be due to lack of fit of the three- 
parameter logistic model because of population shifts or because of errors 
of estimation. (Though Item response theory provides sample Invariant 
parameter e;8tlmat Ion, this sample Invarlance applies to samples (of the 
same or different ability) from a single population. Population shifts 
can cause a change In dimensionality.) In this section, the two sets of 
Item parameter estimates (after transformation to a common metric) for 
form ZGRl are compared for the verbal calibrations, the discrete verbal 
calibrations, the reading comprehenlson calibrations, aad the quantitative 
calibrations. Tables 22 through 24 summarize these comparisons. 

In Table 22, means, standard deviations, and correlations between 
parameter estimates obtained at both administrations are presented 
for all 55 discrete verbal items in Section I of form ZGRl. The upper 
half of the table contains results for the verbal calibrations of these 
items; the results for the discrete verbal calibrations of these items 
are presented in the lower half of this table. The parameters a , b , 
and c are the item discrimination, item difficulty, and" pseudogSess^ng 
param§ters of the three-parameter logistic model. The p is an estimate 
of conventional item diffltulty, the proportion of examiftees giving 
a correct response to the item, that is based on the item response 
function and the marginal distribution of ability for/the group of 
examinees given that item. The p can be viewed as ^ nonlinear bounding 
transformation of b . TJifs bounding transformation was performed for two 
reasons. First, extreme values of b have large standard errors, while 
extreme values of p do not. Second? the Pearson product*moment correlatio 
used in this sectioft, is sensitive to outliers, and a bounded item difflcul 
parameter, such as p , is less likely to pt^oduce troublesome outliers. 
The p values, however, are sensitive to any large differences in group 
ability and could produce a nonlinear relationship between the p estimates 
of the form ZGRl items based on the two administrations.-- As it ^^urned out, 
the abilities of the two groups were similar enough that noplinearity was 
not a problem. 

The means and standard deviations to the right of each rectangle are 
the means and standard deviations of the three item parameters and p for 
the June 1980 administration of form ZGRl, while the summary statist^fcs 
for the February 1980 administration of form ZGRl appear under each 
rectangle. The elements inai^e the rectangle are correlations between the 
estimates obtained at the two administrations of form ZGRl* Nore that 
both item difficulty estimates, b and p , were virtually insensitive 



6u 



56 



TABLE 22 

Correlations and Summary Statistics for Item 
Parameters and Estimated Proportloit Correct for 
the 55 Discrete Verbal Items of Section I of Form ZGRl 



ALL VERBAL CALIBRATION 
ZGRl (2/60) 

a b c p Mean S,D. 

g g g g 

a„ .914 .899 .312 

g _ ■ 

b - .988 .474 1.226 

g 

ZGRl (6/80) c .821 .183 .060 

g 

p .998 .506 .200 



Mean .923 .482 .192 .507 
S.D. .314 1.253 ■ .063 .201 



n - 55 



DISCRETE VERBAL ONLY CALIBRATION 

ZGRl (2/80) 

a b c p Mean S.D. 

g g g g 

a .955 .905 " .328 

b^ .993 .469 1.225 

ZGRl (6/CO) c .842 .180 .049 

g 

p .998 .502 .202 

g 



Mean .912 .467 .182 .504 
S.D. .333 1.243 .044 .20^4 



n - 55 



57 



TABLE 23 

Correlations and Summary Statlatlcs for Item 
Parameters and Estimated Proportion Correct for 
the 25 Reading Comprehension Items df 

Section I of Form ZGRl 



ZGRl (6/80) 



Mean 
S.D. 



ALL VERBAL CALIBRATIO^ 

ZGRl (2/80) 

a b c 

8 g g 

.918 



.802 
.185 



.992. 



-.028 
.831 



.685 

.171 
.033 



.998 
.585 
.156 



Mean 
.814 
-.039 
.167 
.585 



S.D. 
.175 
.792 
.041 
.153 



n - 25 



ZGRl (6/80) 



READING COMPREHENSION ONLY CALIBRATION 



8 



g 

Pg 
Mean 

S.D. 



a 

g 

.946 



ZGRl (2/BO) 

b c 
g g 



.994 



.709 



.920 -.007 .166 
.270 .823 .036 



.998 
.582 
.164 



Mean 

.932 
-.021 
.166 



S.D. 
.289 
.773 
.039 



.58M^ .138 



n - 25 



ERIC 



6 ^ 



58 



TABLE Z4 

Correlations and Summary Statistics for Item 
Parameters and Estimated Proportion Correct 
for the 55 Quantitative Items of 
Section II of Form ZGRl 



' ZGRl (2/80) . i 

■ ■ ' 

a b c p Mean S.D. 

g g g g 

a .969 .856 .398 

"g ■ ! . ■ • ■ 

b .996 .005 i.518 

ZGRl (6/80) c .927 ,183 .074 

• . g ' 

p .939 .576 .232 

8 ■ • 



Mean .849 .020 .181 .573 
S.D. .391 1.517 .073 .231 



n = 55 



6 J 



59 



to group diff^erences and, showed little sample error, but were slightly 
sensitive to the reference set used for calibration, i*e« , the difference 
between mean item difficulties, ,b , is greater in the verbal calibrations 
(.4 74 vs. .482) than the difference between mean item difficulties for the 
discrete verbal calibrations (.469 vs. .467). Note also, the corresponding 
differences in c calibrations (verbal calibrations, .183 vs. .192; 
discrete verbal Calibrations, .180 vs. .182. The differences in p 
(•506 vs. .507 for verbal and .502 vs. .504 ^or discrete verbal) indicate 
that these differences compensate for each other. One can infer that 
these differences are probably due to erfbr 6f estimation. Note that c 
exhibits the leas t 'temporal stability. 

Table 23, which has the same format as. Table 22, contains means, 
standard deviations, and correlations obtained for all 25 reading compre- 
hension items in Section I of form ZGRl. Note that p is virtually 
insensitive to group differences or item ca^libration Reference set. The 
consistency of the item response theory estimate of difficulty, b , 
however, is' slightly imperfect. The- most notable effect evident ?n 
Table 23 is sensitivity of a to homogeneity of it^ calibration set: 
The mean a for the 25 reading comprehension items is higher when these 
25 items a§e calibrated with reading comprehension items only than when 
calibrated with all verbal items. Further discussion of homogeneity 
effects is deferred to'the next section. The final point to note in Table 
23 is the comparatively low correlations obtained between c estimates • 
This is due to the relative easiness of the reading comprehlns ion items (b 
slightly below .0 afe opposed to discrete verbal b of about .5). It is 
difficult to estimate c for easy items because of insufficient data 
at the lower asymptote. 

Table 24 contains the means, standard deviations, and correlations 
obtained for the 55 quantitative items in Section II of form ZGRl. The 
high correlations for a and c and the overall stability of item parameter 
estimates are the notabfe features of this table. 



Sensitivity of Item Parameter Estimates to Violations of Unidimensionality 

Evidence indicating that verbal items are not homogeneous, i.e^. , that 
they measure more than one dimension,^ was presented in the sections of this 
report dealing with the factor analytic review, the violation of local 
independence, and item-ability regressions. In this section, the compar- 
ability of item parameter estimates based on^ calibration of heterogenous 
(all verbal) and homogeneous (discrete verbal only and reading comprehension 
only) item sets is assessed. Calibrations from all five administrations, 
ZGRl(6/80), ZGRl(2/80), K-ZGR2(12/79) , K-ZGR3(4/80) and 3CGRl(6/80), are 
examined. 

Table 25 contains the results j^or estimates of item discrimination (a ). 
The results for discrete verbal items appear in the top half of the table, ^ 
while the bottom half contains the results for reading comprehension items. 
The elements in the top rectangle of Table 25 are correlations between 
''"Item discrimination estimates based on verbal and discrete verbal' calibrations 



7.J 



60 



TABLE 25 . 

Siiummary Statistics-'f or and Correlations Between 
Parameter Estimates of Item Discrimination (a ) 
Based on Sets of Homogeneous and Heterogeaou-s I^ems 



WSCRETE VERBAL ONLY 



ALL 

VERBAL 



ALL 

VERBAL 



• 


ZGRl 
(6/80) 


ZGRi 
(2/80) 


K-ZGR2 
(12/79) 


K-ZGR3 
(4/80) 


CGRl 
(6/80) 


n 


Mean' 


S.D. 


ZGRI (6/80) 


.969 










108 


.922. 


.316 


ZGRI (2/80) 




..„. 








102 


.885 


.344' 


K-ZGR2 (12/75) 






.984 






102 


.898 


.343 


K-ZGR3 (4/80) 








.975 




102 


.874 


.336 


CGRl (6/80) 










.976 


108 


.954- 


.320 


n 


108 


102 


102 


102 


108 








Mean 


.930 


.881 


.936 


.876 


'.963 








S.D. 


.331 


.357 


.380 . 


.344 


.328 

• 










READING COMPREHENSION 


ONLY 












ZGRI 
(6/80) 


ZGRI 
(2/80) 


K-ZGR2 
(12/79)' 


K-ZGR3 
(4/80) 


CGRl . 
(6/80) 


n 


Mean 


S.D. 


ZGRI, { 6/80) 


.800 










47 


.791 


.200 


ZGRI (2/80) 




.889 








45 


.730 


.237 


K-ZGR2 (12/79) 






.926 






45 


.730 


.245 


t 

K-ZGR3 (4/80)' 








.904 




45 


.759 


.285 


CGRl (6/80) 










.902 


47 


.761 


.201 


1 n 


47 


45 


45 


45 


47 








Mean 


.867 


.837 


.824 


.848 


.844 








S.D. 


.287 


.338 


.349 


.324 


.271 









ERIC 



71 



61 



while the correlations between estlamtes based on verbal and Reading 
comprehension calibrations appear In the bottom rectangle. 

Under the £oji rectangle are'th^ number of Items calibrated (n), means", 
and standard deviations of the a for the discrete Verbafl calibrations at 
each of the five administrations? To the 'right of the top rectangle are 
the summary statistics for the corresponding verbal calibrations of the 
discrete verbal Items. Under the bottom rectangle a^re summary statistics 

r the five reading comprehension calibrations, while the corresponding 
summary l^tatlstlcs for the five verbal calibrations of the reading compte- 
henslon Items appears to the right of this bottom rectangle. 

/ ' 

Tables 26, 27, and 28 are Identical In format to Table 25 and contain 
the results for Item difficulty (b ) estimates, estimates of the psuedo- 
guessing parameter 'or lower asymptote (c ), and estimated proportion 
correct (p ) . ^ 

The correlations and means In Table 25 reveal that the discrete 
verbal and verbal calibrations produce considerably more simlllar estimates 
of a than the reading comprehension, and verbal calibrations . The discrete 
verbSl - verbal correlations between a estimates range from .97 to .98, 
while the reading comprehension - verbll correlations range from .80 to 
.93. The mean differences between a estimates for the discrete* verbal 
Items ranges from .00 to .04, while ?he range^f mean differences for 
reading comprehension Items Is .07 to .11. When the smaller standard 
deviations of a estimates for reading compreho^ns Ion Items are considered, 
the magnitude of the mean differences for these lt;ems appears even larger 
relative to the magnitude ofithe mean difference for discrete verbal ltem3. 

.Also evident from Table,)25, In each pair of calibrations, for both 
discrete verbal-verbal and reading comprehension-verbal. Is the fact that 
the standard deviation for the a estimates based on the more homogeneous 
calibrations Is higher. The mea§ standard deviations of a estimates for 
the discrete verbal Items based on the. discrete verbal calibrations and 
the verbal calibrations are .349 and .332, respectively. Similarly, the 
mean standard deviations of a estimates for the reading comprehension 
Items based on the reading co&prehens Ion calibrations and the/ verbal 
calibrations are .315 and .237, respectively. As with the differences In 
mean estimates, the difference In mean standard deviations of a estimates 
Is more extreme for reading comprehension Items than for dlscrefe verbal 
Items. 

Evidence pertaining to the comparability of Item difficulty estimates (b ) 
appears In Table 26. - The correlations and means In this table reveal that ^ 
the discrete verbal and verbal calibrations produce slightly more simlllar 
estimates than the reading comprehension and verbal calibrations. For the 
discrete verbal Items, the correlations all round to 1.00, while the 
mean differences range from .00 to .01. For the reading comprehension 
items, the correlations range from .98 to 1.00 and the mean differences , 
In b range from .00 to .03. When compared to the results for the a 
estlfiates, the b estimates show much less sensitivity to homogeneltf of 
item callbratlon^set . 



ERLC 



.tABLE.26 

Summary Statistics for and Correlations Between 
Parameter 'Estimates, of Item Difficulty (b ) 
Based on Sets of Homogeneous and Heterogenous Items 



62 



ALL 

VERBAL 



ZGRl (6/80) 
ZGRl (2/80) 
K-ZGR2" (f2/79) 
■k-ZGR3 (4/80) 
CGRl (6/80) 
n 

Mean 
S.D. 



DISCRETE VERBAL ONLY' ' 

ZGRl ZGai- KrZGR2 K-ZGR3 CGRl 
(6/80) (2/80) (12/79) (4/80) (6/80X, 

.998 

.996 

.998 

.999 



108 
.334 
1.237 



,998 

102 '102 102 108 

.33.5 . .255 .265 .366 

1.330 1.154 



n Mean S.D. 

108 .336 1.229 

102 .330 1.222 

102 .269 1.284 

102^ .259 1.302 

108 .361 1.143 



1.211 -.1.281 



' , READING COMPREHENSION ONLY 



> 


ZGRl 
(6/80) 


ZGRl 
(2/80) 


K-ZGR2 
(12/79) 


K-ZGR3. 
(4/80) 


CGRl 
(6/80) 


n 


Mean 


S.D. 




ZGRl (6/80) 


.993 










47. 


.16 7 


.952 




ZGRl (2/80) 




.994 








45 


.433 


.978 




ALL y K-ZGR2 (12/79) 
VERBAL ' ' 
K-ZGR3 (4/80) 






.996 

i 

- "V 


.978 




45 
45 


.367 
.369 


.959 
1.092 




CGRl (6/80) 










.995 . 


47 


.180 


.954 


* 


n 


47 


45 


, 45 


45 


47 










Mean 


.162 


.453 


.387 


.347 


.152 










S.D. 


.950 


.979 


.981 


1.060 


.921 











7,. 



63 



TABLE 27 

Summary Statistics for and Correlations Between 
Parameter Estimates of Lower Asymptote Cc ) 
Based on. Sets of Homogeneous apd Heterogenous^Items 

DISCRETE VERBAL ONLY 







ZGRl 
(6/80) 


ZGRl 
(2/80) 


KtZGR2 
(12/79) 


K-ZGR3 
(4/80) 


CGRl 
06/80) 


n 


Mean 


S.D. 




ZGRl (6/80) 
ZGRl (2/80) 


.897 


.7 67 








108 
102 


. 17 7 
.183 


.054 
.053 


ALL 


K-ZGR2 (12/79) 






.874 






102 - 


..179 


.049 


VERBAL.^ 


K-ZGR3 (4/80) 
CGRl (6/80) 








.940" " 


* 

.932 


' 102 
108 


.175 
.181 


.040 
.058 




n 


108 


102 ' 


102 


102 ■ 


108 


■ 








Mean 


.r76 


.180 


.161 


..173 


.177 




• 






S.D. 


.047 


.OAO 


.059 


;040 


.059 












READING COMPREHENSION 


ONLY 






- 








ZGRl 
(6/80) 


ZGRl 
(2/80) 


K-Za{l2 
(12/79) 


K-ZGR3 
(4/'80) 


CGRl 
(6/80) 


n 


Mean 


r » 

S.D,. 




ZGRl (6/80) 


.658 










47 


.165 


.043 , 




ZGRl (2/80) 




.844 








45 


.168 


.011 


ALL 


K-ZGR2 (12/79) 






.800 






45 


.169 


.033 


VERBAL 


K-ZGR3 (4/80) 
CGRl (6/80) 








.550 


.^23 


45 
47 


.175 
.164 


.065 
.039 




n 


47 


- 45 


45 


45 


'47 










* 

Mean 


.159 


.168 


.172 ■ 


.168 


. .158 










S.D. 


.042' 


-.034 . 


.037 


.037 


.039 









ERIC 



64 



TABLE 28 

Summary Statistics for. and Correlations Between 
• Parameter Estimates of Proportion Correct (P ) 
Baped on Sets of Homogeneou,s and Heterogenous* Ttems 



DISCRETE VERBAL ONLY 







ZGRl 


ZGRl ■ 


"K-ZGR2 


K-ZGR3 


CGRl 












^6/80) 


f2/80) 


(12/79) 


(4/80) 


(6/80) 


n 


Mean 


S.U. 




ZGRl (6/80) 


.999 










108 


.52'3 


.205 




ZGRl (2/80) 




.999 








102 


.529 


.202 


ALL 


K-ZGI12 (12/79) 






.999 






102 


.5-39 


.210 


VERBAL 


1, r 
















,218 










.999 




102 


.537 














.999 


108 


.519 


.202 




n 


r.l08 . 


102 


102 


102 


108 










Mea^tt 


.520 


.527 


.531 


C 1 c 

.5 j5 


Q 1 ^ 










S.D. ' 


V.208 


.204 


.210 


.220 


.204 ^ 


1 










RjilADING COMPREHENSION 


ONLY ,. 




* ^ 










ZGRl 


ZGRl 


K-ZGR2 


K-ZGR3 


CGRl 












(6/80) 


(2/80) 


(12/79) 


(4/80) 


(6/80) 


n 


Melan 


S.D. 




ZGRl (6/80) . 


.999 










/. 7 
H 1 


«i *\ 


1 7 9 




Z(3R1 (2/80) 




.998. 








45 


.5-13 


.169 


. ALL . 


K-ZGR2 (12/79) - 




.999 






45 


.522 


.171 


Verbal 




















K-ZGR3 (4780) 








.999 






.520, 


.18 7 




CG'Rl (6/80) , 










.998 


47 


.555, 


.167 






47 


-45 


45 


.45 


47 










Mean 


.551 


.509 


.522 


.517 , 


- .554 












.177 


.174^ 


' \l78 


.190 


.174 









ERIC 



65 



-The results pertaining to the sensitivity of^q eftlmates to homogene^'ty 
of item calibration jej^t ar^^ portrayed In Table 27. *With-the excep^tion of 
the discrete verbal itejas on form K-ZGR2, all me^tn dlfjferences* in thier 
table^'re all less than .01. Compared to those iti Tables 25 and 26, 
correiatiops in Table 27 ate low and mo r 6 variablis^ /reflecting^ difficulties 
inl^erent in obtaining Stable estimates of c (Liird,* 1975b)\ - 

.Table 28 reveals that' the sii^ilarity 'of p estimates based on 
heterogenous vs. homogeneous calibratl,Qns, is v§ry high. This high degree 
oi similarity is evident for •both di'screte verbal items andr reading 
comprehension items, as Is reflected in the ^means and correlations in this 
table. An inference suggested by the results in Table 28 is that the 
observed data can be approximated equally as well by sets of heterogeneous 
items (all verbal) as by 8et» of homogeneous items (discrete verbal and 
-reading comprehension). This; inference was aliso ^suggested by the results 
discussed in the section on item-ability regressions. 

Comparability of Ability Estimates Based on HotttOgenous and Heterogeneous 
- Sets of' Items ' ; • . - 

The review of- factor analytic studies conducted on the ORE Aptitude 
Test led to a decision to separate verbal It'emS^irito mutually exclusive 
sets of discrete verbal items ^nd reading C(^prehdnsion itetias because the 
evidence indicated that the items on the verbal scale were measuring two ' 
correlated factors. Consequently, all veifbal items were calibrated at 
least twice, once with a set of homogeneous it<ims of like' type, e.g*, 
discrete verbal or reading comprehension, and onc^ with a set of heterogenous 
items comprised of both discrete verbal and tead^lnj^ [comprehension^ items. 
This procedure produced three aoility scores for each exaOii nee : * a verbal 
ability score based on all verbal items (6^), a, discrete ^^erba! ability 
score based on discrete verbal items (6|^y)» and a reading .comprehension 
ability score based on reading -.comprehension it6ms (iJ„)* 

If discrete verbal items and reading comprehenslpn Utems were measuring 
the same attribute, then ability estimates based on eachiset of items should 
be very highly correlated. On the other hand, if these aifferent sets of 
items were measuring distinct abiliMes , ' the expected correlation would 
not be as high. Table 29 provides Evidence relevant to assessing whether 
the reading comprehension items and^ the discrete verblll Items are measuring 
the same attti'bute.. It contains correlations amonR 0,,, 0 ' and B for 
all four administrations. . " . 

It is clear in Table 29 that discrete verbal iiBility^ had a higher 
correlation with verbal ability than did reading comprehension ability, 
and that discrete verbal ability aftd reading comprehension ability were 
less correlated with each other than with verbal ability. The three 
correlations are .96 to .97 for cjiscrette verbal ability and verbal 
ability, .86 to .89 for reading comprehension ability and verba Jt,. at?ility, 
•^nd .73 to .77 for discrete verbal ability and reading comprehension 
ability. Since estimated 6 has about the same reliability as the usual 
number-right test score, a correct lon^ for attenuation due to e1j*ror of 



ERIC 



TABLE 29 ' 

Correlations ^o^ng Ability Estimates for Verbal (V)', 
Discrete Verbal (DV) knd Reading Comprehension (R) Scales 



66 



Admin 
Date 



12/79 



ZGRl 



Form 



K-ZGR2 



DV V R 
DV 1.000 .959 .726 
V 1.000 .860 

R 1.000 ■ 

N - 3861 



K-ZGR3 



3gGRl 



DV V R 

DV. 1.000 .965 .764 

2/80 V 1.000 .881 

R 1 .000 

N - 3581 



4/80 , 



DV V R 
DV 1.000 .965 .766 
V 1.000 .886 

R 1.000 
N - 4043 



DV V R 
DV 1.000 .968 .746 
6/80 V 1.000 .861 

• R 1.000 
N - 4 351 



DV V R 
DV 1.000 .970 .758 
V 1.000' .863 



1.000 



2579 



ERIC 



77 



67 



estimation is ptobably necessary. Assuning that this correction has 
little differential effect on the correlations, then the correlations 
in Table 29 indicate that discrete verbal ability and reading' comprehension 
ability are distinct, highly correlated abilities/ 

Further evidence for the conclusion that reading comprehension 
ability and discrete ve-rbal ability are distinct, highly correlated 
abilities is presented in Table 30, which contains correlations among ' 
proportion-correct true scores for verbal, discrete verbal, and reading 
comprehension abilities. Proportion-correct true score is obtained 
by substituting ability estimates into the test characteristic curve, 
which is a sum of the item characteristic curves for the items^ defining 
the test, and dividing the result, which is the number-correct true score, 
by Che namber of Items in the tes,t. Preference for correlations of 
bounded difficulty parameters was one reason for examining proportion- 
correct true score. ; . . 

The correlations in Table 30 present a range of .96 to .98 for the 
discrete verbal-verbal cprrelation, a range of .66 to .90 for the readying 
comprehension-verbal correlation, and a range of .73 to .80 for the ' 
discrete verbal-reading comprehension correlation. These latter results, 
like the results irt Table 29, provide evidence for the existence of the 
two distinct, highly correlated reading* comprehension and discrete verbal 
abilities. - 

The fourth column in Table 30 contains the correlat ^tins of the 
variable V* with the discrete verbal, verbal, and reading comprjehensioo 
proport ibn-correct true scores. This variable, V*, is defined as the - 
sum of the discrete verbal number-correct true score arid the reading 
comprehension number-correct true score divided by the. total number of 
items, i.e., V* is a weighted composite of the discrete verbal and reading 
comprehension proportion-correct true scores, wtiere the weights are the 
number of discrete verbal items ahd the number of reading comprehension 
items, respectively. 

The striking feature of the fourth columns in Table 30 is the close 
resemblance of the V* correlations to the verbal (V) congelations. For ! 
all five administrations, V and V* are virtually perfectly correlated, 
and their correlations with discrete verbal (DV) and reading comprehension 
(R) are almost identical. Hence, Table 30 provides evidence for thinking 
of the verbal true score dimension as a weighted composite of the discrete 
verbal and reading comprehension dimensions. Table 31 provides further 
support for this inference. 

Table 31 containis (neans and standard deviations for the verbal (V), 
discrete verbal (DV), reading comprehension (R), and reconstructed verbal 
(V*) proportion-correct true scores for all five administrations. Note 
that the maximum difference C>etween verbal (V) and reconstructed verbal 
(V*) means and standard deviations is .001, which provides further 
support for viewing verbal ability as a weighted composite of discrete 
verbal ability and reading comprehension ability. 



7o 



6b 



TABLE 30 

Correlations Aaong Proportlon-Ck>rrect True Score Estimates 
for Verbal (V), Discrete Verbal (DV), Reading Comprehension (R) 
and Reconstructed Verbal (V*) Scales 



Admin 
Date 



12/79 



ZGRl 



Form 



K-ZGR2 
DV V 



R V* 
DV 1.000 .963 .734.. 968 
V 1.000 .879 .996 

R 1.000 .8&2 

N - 3861 



K-ZGR3 



3CGR1 



DV V R V* 

DV I. 000 .961 .758 .968 

2/80 V I. 000 .899 .996 

R 1.000 .898 

N - 3581 



4/80 



DV V R V* 

DV 1.000 .962 .768 .971 

V 1.000 .902 .995 

R 1.000 .899 

N - 4043 



DV V R V* 

DV 1.000 .971 .775 .971 

6/80 V . 1.000 .901 .999 

R 1.000 .903 

N - 4351 



DV V K V* ' 

DV 1 .000 .980 .79tt .98"i 

V l.UOO .b9h .999 

R ' l.UUU .898 
N " 2379 



ERIC 



7:y 



69 

TABLE 31 

Summary Statiatics for Verbal (V), Discrete Verbal (DV), 
Reading Comprehension (R), and Reconstructed Verbal (V*) 
Proportion-Correct True Score Est^Lmates 



Form 




DV 


R 


V 


V* 


ZGRl (6/80) 


Mean 


.318 


.613 


.349 


.348 


• 


S.D. 


.132 


.194 


ICC 

.133 


.136 


ZGRl (2/80) 


Mean 


.323 


.624 


.334 


.333 




S.D. 


.131 


.193 


1 c / 

.134 


1 c c 

.13j 


i 

K-ZGR2 (12/79) 


Mean 


.360 


.636 


.390 


.390 




S.D. 


.133 


..183 


.132 


.152 


K-ZGR3 (4/80) 


Mean 


.332 


.631 


.362 


.363 




S.D. 


.142 


.173 


.144 


.144 


3CGR1 (6/80) 


Mean 


.347, 


.370 


.333 


.334 




S.D. 


.163 


.163 


.137 


.137 




ERIC 



8.J 



70 



Further evidence pertaining to the dimensionality of the verbal items 
is also presented in Table 32, which contains correlations among observed 
scores, with and without correction for attenuation due to measurement 
error, on the verbal; item types for four distinct samples of examinees who 
took one of these four forms in June, 1980: ZGRl^,^, ^^^^^CAS' "^^^^CAl 
3CGR1 . The elements on the main diagonals of the four correlation 
matricei in Table 32 are reliability estimates. An adaptation of 
Kuder- Richardson formula 20 (KR-20) for formula-scored tests (Dressel, 1940) 
produced the reliability estimates for sentence completions, analogies, 
antonyms, and reading comprehension. These four modified KR-20 estimates 
were used to estimate the reliability for the verbal scale via the formula 

4 

(4) Rel - 1 - X Var (1-Rel )/Var^ , 

where Rel and Var are the reliability and variance, respectively, of 
the verbaY scale, and Var and Rel fire the variance and modified KR-20 
rjeliability estimatje of the ith scale, where i is either one of the three 
discrete verbal item types or the reading comprehension item type. To 
obtain the reliability estimate for the discrete verba-1 scale, the above 
formula is used with the three discrete verbal item type variances and 
r;,eliabilities and the discrete verbal variance. 

The elements to the left of the main diagonal a^e observed scbre 
correlations, while the entries to the right are the same correlations 
corrected tor attenuation. Note that part-total correlations, such as the 
five correlations with verbal score, were not corrected for attenuation. 

The disattenuated correlations between discrete verbal, and reading 
comprehension are of primary interest. Since the reliabilities used to 
correct the observed score correlations for attenuation are estimates of 
item homogeneity, the reliabilities reported on the diagonals in Table 32 
are probably underestiraates . Hence, the disattenuated correlations in 
this table can be viewed as overestimates of the true score correlations 
among the verbaj item types. The correlations between estimated proportion 
correct true scores for discrete verbal and reading comprehension on the 
June 1980 administrations of forms ZGRl (r - .775) and 3CGR1 (r - .798), 
reported in Table 30, fall between the upper bound disattenuated correla- 
'tions and the observed score correlations reported in Table 32, providing 
further evidence for the hypothesis that the verbal ability measured by 
the GRE Aptitude Test is composed of two distinct, highly c^rfel,ated 
reading comprehension and discrete vcjcbal abilities. 



8 1 



Table 32 - 

i ■ ■ ■ ■ . ■ 

Correlations Among Verbal Item Types 
With and Without Correction For Attenuation* 



ZGRl 



2,480) 



2,485) 



3CGR1 



Number of 







1 


2 


3 


4 


5 


6 




1 


2 


3 


4 


5 


6 


Items 


1 


Verbal" 


.929 












1 


.934 












80 


2 


Discrete Verbal 


.957 


.896 








.811 


2 


.956 


.901 








.806 


.55 


3 


Sfentence Compl.'- 


.877 


.888 


.759 


.930 


.880 


.864 


3 


.882 


.898 


.765 


.946 


.903 


.852 


17 


4 


Analogies 


.854 


.895 


.693 


.732 


.978 


.795 


4 


.859 


.895 


.710 


.736 


.969 


.807 


18 


5 


Ant onyms ' 


.831 


-.894 


.669 


.730 


.761 


.710 


5 


.863 


.901 


.697 


.734 


.779 


.70S 


20 


6 


Reading Comp. 


.882 


.710 


.696 


.629 


.573 


.855 


6 


.886 


.713 


.695 


.645 


.580 


p869 " 


25 



■ / 









C, , (N 

k. A 1 


= 1 ,485) 










• 


1,495) 




•Number 






1 


'2 3 


4 5 


6 




1 


2 


3 4 


5 


6 


Items 


1 


Verbal 


.929 








1 


.931 


hl3 








75 


2 


Discrete Verbal 


.974 


.911 




.858 


2 


.975 






.872 


"53 


3 


Sentence Compl. 


.845 


.8^45 .718 


.894 .863 


.899 


3 


.839 


.841 


.709 .917 


.857 


.909 


13 


4 


Analogies 


.863 


..886 .653 


.743 .909 


.847 


4 


.873 


.895 


.664 .740 


.939 


.a69 


18 


5" 


Antonyms 


.889 


.927 .677 


.726 .858 


.768 


V 5 


.897 


.931 


.670 .750 


.863 


.795 


22 


6 


Reading Comp. 


.864 


.728 .677 


.649 .632 


.790 


6 


.874 


.745 


.684' .668 


.660 


.799 


22 



ERIC 



*Upper triangle has correlations corrected for attenuation; 
diagonal has reliability estimates; 
lower triqngle has uncorrected correlations. 



82 



83 



V 



72 



, ' ' IRT EQUATING: , . 

COMPARABILITY WITH LINEAR AND EQUIPERCENTILE EQUATING 

In preceding sections of this report, the reasonableness of the 
assumptions of item tesponse theory for the GRE Aptitude Test has been 
assessed. Evidence has been presented that, to some extent, the assumption 
of urffldimensionality is violated within, each section of the Aptitude Test, 
De^ite these violations ^ the analysis of item-ability regressions indicated 
tlyat, for the verbal items and two of the three analytical item types 
(logical diagrams and analytical reasoning), the three-parameter logistic 
model fit the data weli. The quantitative item types, particularly 
quantitative comparison items , and the analysis of explanations items were 
fit less well by the mod^l* Some quant itati*ve comparison and analysis of 
explanations items showed local instances of an inverse relationship 
between the pr>&bability of^-responding correctly to the item and estimated 
theta (i.e.^ n^nmonotonicity) . Nonetheless, IRT-based equating might > 
well be robust to violations of these assumptions. This section will 
compare a variety of equatlngs for three forms ^f the GRE Aptitude Test. 
The equating methods will be .described , the equating plan will be outlined, 
and the results of the various equatings will be presented, compared, and 
analyzed. 



Equating Methods ^- . l 

In practice, despite efforts by test development experts, two forms 
of the GRE Aptitude Test cannot be expected to be of precisely equal 
difficulty. Since it is inherently unfair to compare without adjustment 
the raw scores of examinees who take two tests that differ in difficulty, 
equating procedures have been developed to transform scores from different 
test forms to a single scale. These equating procedures each consist of 
two parts, a data collection design and an analyt-ical method to determine 
the appropriate transformation. 

There are three basic designs for data collection: single*^group, 
equivalent group, and anchor test (Lord, 1975a). Equatings considered in 
this study are based>on the latter two designs. In the equivalent-group 
design, the old form (forpi already on scale) and new form (form to be 
scaled) .are administered to random or otherwise ^equivalent samples from 
the same populations J In practice this is done through a procedure known 
as spiralling (Conrad, Trismen, & Millar, 1977). Test books are packaged 
alternating the old and naw forms and then administered within each test 
center s(^ that half the exaftiinees within each test center take each form. 
The anchor-test design i's one inr which one form of the test is administered 
Ko one group, another ^orm to another group, and a common anchor test to 
both groups. The anchor test alJfews the equating transformation to take 
the difference in abilities of the two groups into account; the equivalent- 
group method depends on spiralling to^inimize ability differences. 

Three major analytical methods t<^ determine equating transformations 
were used in this research: equipercentile , linear, and item response 



ERLC 



8i 



. 73 



theory based true score equating. In equipercentile equating, a trans- 
formation is chosen such that scores from the two tests will be considered 
equated if they correspond to the same percentile rank in some group of 
examinees. For linear equating^ ^he chosen transf ormati-on is such that 
scores from the two tests will be considered equated if they correspond to 
the same number of standard deviations from the mean in some group of 
examinees. The transformation chasen for item response theory based 
equating is such that true scores fron^ the two tests will be considered 
equated i'f they correspond to the same estimated theta (see Lord, l^bOa, 
chapter 13.5 for a more complete description of item response theory based 
true score equating). 

Nine variants of item response theory based equating were performed 
in this research. These variants differ along three dimensions: (a) trie 
data collection design: equivalent group or anchor test; (b) the item 
parameter linking procedure; and (c) the composition of the item sets used 
in the LOGIST calibrations. For the equivalent^group design, the separate 
calibrations for the old and new forms are assumed to be on the same scale 
based on group equivalence, br the items in ttie new form appeared in an 
experimental section of the old form and were calibrated in a single 
LOGIST run with the old form. For the anchor-test design, the parameter 
estimates were either linked by the Lord-Stocking robust procedure (further 
divided into number of links to the base^ scale: either one or two) or 
were not linked. Three variants of the composition of the item sets used 
in the LOGIST calibrations were investigated: both old and new forms had 
a single calibration per form of heterogeneous item types; the old form 
had a heterogeneous calibration, but the new form had two separate 
homogeneous calibrations; and both the old and new forms had two 
homogeneous calibrations per form. Not all possible combinations of 
these dimensions were used in this research. 

Table 33 presents a concise description of the nine IRT equating 
variants studied in this re'Search and indicates designations (to be used 
through the rest of this report) for each variant. Tables 33, 34, and 35 
indicate which equating variants were used (respectively) for the verba)., 
quantitative, and analytical sections. Table 37 describes the three 
non-IRT equating variants. Tables 36, 39, and 40 indicate which of these 
variants weire used for the verbal, quantitative, and analytical sections 
of each form. 

The equating variant designations given in Tables 32 and 3b follow a 
straightforward pattern. The first character is the designation (I, K, 
or L) indicates the'^ equating method is I^RT, E^quipercent ile , or Linear. 
The second character (E or A) indicates the general data collection 
design, E^quivalent group or Anchor test. The IRT equating variants are 
designated with three or four characters. The third character (S, P, L, 
or W) provides information about the linking of item parameter scales: 
separate calibrations whose scale equivalence is assumed based on Spiralling , 
item parameters Precalibrated in the variable section of the old form, 
item parameter scales linked using the JLord-Stocking robust linking 
procedure, or equating ^Without linking item parameters (Lord, 19dl). The 



8:; 



74 

fourth character (V, H, or 2) either indicates the composition of item 
sets used in parameter estimaition: a heterogeneous, all Verbal items, * 
single calibration for the old form and two homogeneous, reading comprenension 
and discrete verbal separately, calibrations for the new form; two tiomogeneous 
calibrations for both the old and the new form; or, in the case of the IAL2 
equating, that there were 2_ links in the chain to put the item parameter 
estimates on scale. 



Table 33 

Variants of IRT Equating and Their Designations 



Data Collection Design 



Equivalent Group '_ Anchor Test 





Separate calibra- 


All operational 












*tions of opera- 


items in new 












tional items in 


form precallbrated 


Lord-Stocking robust 


linking 


Equating with- 


Composition of item 


old and new forms 


in variable 




procedure^ 




out linking 


sets used in param- 


assumed to be on 


section of old 








item parameters 


eter estimation* 


scale based or> 


form 


Number of 


links to base scale 


(Lord, 1981) 




group equivalence 




1 




. 2 




Heterogeneous for • 








* 






old and new forms 


lES 


lEP 


lAL 




IAL2 


lAW 


Heterogeneous for 








— t 






old form; homogeneous 














for new form 


ieSv 


** 


lALV 








Homogeneous for old 














and new forms 


iesh 




lALH 









*The composition of item sets used in parameter estimation was varied only for verbal, for which discrete 
verbal items and reading comprehension items were calibrated separately in some analyses • 



**These variants were not studied in^this researqh. 



Table 34 * 
Verbal Equatings 
IRT Variants and Fonns Analy2ed 







Data 


Collection Design 










Equivalent 


Group 


Anchor 


Test 








Separate c?a libra— 


All operational 












_tions of opera- 


items in new 












tional items in 


form precalibrated 


Lord-Stocking robust 


linking 


Equating with- 


Composltlan of item 


old and n6w forms 


in variable 


procedure 




out linking 


setB u«ed in param- 


assumed to be on 


section of old 








item parameters 


eter estimation* 


scale based on 


form 


Number of links 


to ba«e scale 


(Lord, 1981) 




group equivalence 




1 




2 




Heterogeneous for 






ZGRl 


K 


-ZGR3 


' r^— 

K~ZGR2 


old and new forms 


3CGR1 


3CGR1 


K-1ZGR2 






K-ZG.R3 








K-ZGR3 








He*tei;ogeneous for 






ZGRl 








old form; homogeneous 






K-ZGR2 








for new form 


3CGR1 


ikit 


K-ZGR3 






** 


Homogeneous for old 






* ZGRl 






y ^^^^ — 


and new forms ♦ 


3CGR1 


itit 


K-ZGR2 




itic 










K-ZGR3 






f 



**f hese variants 'were Aot studied in this research. 



4 



•Table 35 
Quantitative Equatings ^ 
IRT Variants and Forms Analyzed 



Data Collection Design 





Equivalent 


Group 


Anchor 


Test 






■jc pel Iaci I. c v»oJ.J.iiLci 


All onAT'o^'f on 9 1 

t\±±- uper oL J. u|ia J. 










tioris of opera- 


items in new 










tional items in 


form precalibrated 


Lord-Stocking^ robust linking 


Equating with- 


Composition of item 


old -and new forms 


in variable 


procedure 


out linking 


sets used in param- 


assumed to be on 


section of old 






item parameters 


eter estimation 


scale based on 


form 


Number "^f links 


to base scale 


(Lord, 1981) 




grqup equivalence 




1 


2 




Heterogeneous for 




p 


ZGRl 


K-ZGR3 




old and new forms 


3CGR1 


3CGR1 


K-ZGR2 




** 








K-ZGR3 






Heterogeneous for 












old form; homogeneous 












for new form 




** 








Homogeneous far old 












and new forms 













**These variants were not studied in this research. 



ERIC 



Table 36 



Analytical Equatings 






Data Collection Design 








Equivalent 


Group 


Anchor Test 








Separate calibra- 


All operational 










f- ■{ n a r\f /-\ fv £1 »" Q — 


items in new 










tional items in 


form precalibrated 


Lord-Stocking robust 


linking 


Equating with- 


Composition of item 


old and new forms 


in variable 


procedure 




out linking 


sets used in param~ 


A Q R 1 iliWtr\ r rt Kfk on 

<3 9 O UUKZ U \» \J \Jll 


section of old 






item parameters 


eter e8>imation 


scale based on 


form 


Number of links to base scale 


(Lord, 1981) 




group equivalence 




1 


2 




il A f C7^ ot 1 a fnr 
1 itr i. cr L vj^c lie vju D X U L 












old and new forms 


3CGR1 


3CGR1 






** 


Heterogeneous for 












old form; homogeneous 












for new form 












Homogeneous for old 












and new forms 


an 








** 


— — 1— 

**The8e variants were 


not studied in this 


research. 











Table 37 




Variants of Non-IRT Equating 




and Their ^Designations 


Data Collection Design 


Me t hod 


Equivalent Group Anchor Test 


Equipercentile 


EE ** 


Linear 


LE LA 




Table 38 




Verbal Equatings 


Non 


"IRT Variants anfl Forms Analyzed 


Data Collection Design 


Method 


Equivalent Group Anchor Test 


Equipe rcent ile 


^ 3CGRI ** 
K-ZGR3 


Linear 


3CGRI 

K-ZGR3 K-ZGR2 




Table 39 
C^ant i tat ive Equatings 


Non- 


IRT Variants and Forms Analyzed 


Uata Collection Design 


Method 


Equivalent Group Anchor Test 


Equipercent i le 


3CGR1 _ 


Linea r 


» 3CGRI 

K-ZGR3* k:-ZGR2 


^Equated through 


a combiniitio^n of single-group and equivalent- 



group designs; see text in equating plan section. 
**Thi8 variant was not studied in this^ research. 



80 



Table 40 
Analytical Equatlngs 
Non-IRT Variants and Forms Analyzed 



Data Collection Design 


Method 


Equivalent Group Anchor Teet 


Equipercentile 


3CGR1 ** 


Linear 


3CGR1 K-ZGR2 


**This variant was 


not studied in this research. 



J- 



ERIC 



Equating Plan 




81 



All IRT equatings used form ZGRi as the old form. Parameter estimates 
for the old form ZGRl items were based on the June 1980 administration, 
with the exception of the lAW method which used data from the February 
1980 administration. The linear and equipercent lie equating for form 
3CGR1 also used form ZGRl administered in June 1980 as the old form. The 
verbal linear and equipercentile equatings of form K-ZGR3 used Form ZGRl 
administered in Decemj^er 1977 as the old form. The quantitative linear 
equating of K-ZGR3 was complicated by the changing of one item. The 
quantitative section of form K-ZGR3 was originally equated to form ZGRi 
administered in December 1977 using the equivalent-group design. When 
one item was changed, the unchanged items were used in equating to the 
original, prechange form using data from April 1979, and then the total . 
quantitative section including the revised item was equated to the 54 
unchanged items using data from the Apfiil 1980 administration. 

Figures 13, 14, and 15 present the equating plans for the verbal, 
quantitative, ^nd analytical sections. Although, in the most obvious ' 
sense, ZGRl administered in June 1980 (or February 1980 for the lAW 
equatings) is the old form for the IRT equatings (that is, the item 
parameters estimated from that administration's data were used), it is the 
item parameter scale linking (with the exception of the lAW method) that 
is most analogous to the equating links in linear or equipercentile 
equating plans. It is during these links that statistical error and bias 
can enter the equating system. The numbers in the boxes in Figures 13, 
14, and 15 indicate the numbers of items in the operational section. 



Judging_ the Adequacy of Equatings 

Unfortunately, there, is no unarguable objective criterion available 
to judge the adequacy of the equatings in this research. It is inappro- 
priate to use the linear or equipercentile equatings as a criterion or 
method, particularly since (with the possible exception of the quantitative 
section in form 3CGRI) the assumptions upon which the linear and equi- 
percentile methods are based are violated. As we have little evidence 
concerning the robustness of IRT equating to violations of its assumptions, 
we also have little evidence concerning the robustness of most of the 
cla&sical methods (see, however, Marco, Petersen, & Stewart, 1979, and 
Petersen, Marco, & Stewart, in press, for a detailed analysis of the 
robustness of many anchor-test design methods). Further consideration of 
the assumptions of the equating variants used in this study, evidence 
concerning the violation of these assumptions, and interpretation of the 
equating results based on this evidence will be presented in' the discussion 
section of this chapter. 



ERIC 



97 



\ 



82 



Figure 13 
Equating Plan for Verbal Scales* 



Ad ml ill t r a t i o n 
Date 

1/77 

10/77 

12/77 

12/79 

2/80 

4/80 
6/80 



Base scale ZGRl 
LA 



LA 



K-ZGR2 



K-ZGR3 



'7K 



80 



80 1 ^ 



80 



LE.EE 



80 



lAL, 
lALV, 
lALH 



7^ lALV.IALH.IAW 



lAL 



80 



IAL2 



IAL,IALV,IALH,IAW 



80 



80 ^ 



lES, lESV, lESH, lEP, 
LE, EE 



3CGR1 



75 



*The four administrations of form ZGRl, two administrations of form K-ZGR2 , 
and two administrations of form K-ZGR3 are each assumeii to be intraequated 
by virtue of the respective identity of their items. 



/ 



b3 



Administration 
Date 

1/77 

10/77 

12/77 

4/79 
12/79 

2/80 

4/«0 

6/80 



Figure 14 

Equating. Plan for Quantitative Scales* 

Base Scale ZGRl * K-ZGR2 K-ZGR3( 1 ) K-ZGR3(2) K-ZGR3( 1 ) 3CGR1 
LA 



LA 



55 



55 



LE 



55 



55 



lAL 



55 



lAL 



55_^ 



IAL2 



54 



lAL 



55 



lES, lEP, LE, EE 



55 



55 



*The four administrations of form ZGRl, two administrations of fonti K-ZGR2 , 

and two administrations of form K-ZGR3(1) are each assumed to be intraequated 
by- virtue of the respective identity of their items. 

**see text • / 



Figure 13 

Equating Plan for Analytical Scales 



Administration 
Date 

6/80 



ZGRl 



3CGR1 



70 



IES,IEP,LE,EE 



66 



ERIC 



9j 



- 84 



Results 

Verbal eguatings . Table 41 presents means, standard, deviations , and 
skewnesses based on the various verbal equatings. Two factors went into 
the computation of these summary statistics: the relationship between 
raw and scaled score, as produced by the various equatings, and a frequency 
distribution of raw scores. This frequency distribution is simply a 
convenient vehicle for converting the vectors of scaled scores into the 
more easily interpretable , scalar, summary statistics presented. Any 
reasonable distribution would have been appropriate. The distributions 
used were based on the groups of examinees who took each of the forms when 
they were first administered. The equating tables and frequency distri- 
butions used to compute Table 41 are presented in Appendix A. 

It should be noted that the means and standard deviations for the 
linear and equipercentile equatings based on the equivalent-group design 
are virtually identical. This is to be expected as they are based on 
identical data and the linear equating sets the first two moments of the 
old and new form distributions equ^l and the equipercentile equating sets 
all moments of the two distributions equal. Since only five significant 
digits were retained in the computations, minor differences due to small 
losses in accuracy in the computation of the standard deviations are 
n(j tlceable. 

Figures 16, 17, 18, and 19 plot the various equatings for the verbal 
sections ot , respectively, forms ZGRl , K-ZGR2 , K-ZGR3, and 3CGR1 . This 
type ot plot tends to point out the similarities between equatings more 
tlian the differences. A residuals plot is often more informative. In such 
a plot the difference between 'each equating and a comparison equating is 
plotted dgainst raw score. Figures 20, 21, 22, and 23 are residuals plots 
using the IKP or lAL equating as the comparison, whichever is available. 

Quantit ative equatings . Table 42 was computed in the sande way that 
Tcihle ATwas computed and compares the various quantitative equatings. 
The equating tables and frequency distributions used to compute Table 42 
are presented in Appendix A. Figures 24, 25, 26, and 27 are plots of the 
various quantitative equatings for forms ZGRl , K-ZGR2, K--ZGR3, and 3U;R1 , 
respectively. Figures 28, 29, 30, and 31 are residuals plots using the IhP 
or lAL (whichever is available) equating as the comparison. 

Analytical eq uatings . Table 43 presents the means, standard deviations, 
and skewnesses based on the analytical equatings of form 3CGR1. The 
equating tables and frequency distributions used to compute Table 43 are 
presented in Appendix A. Figures 32 and 33 are, respectively, a plot of 
the equatings and a residuals plot (using the lEP equating as the comparison) 
for the analytical section of form 3CXiRl. 



Discussion of Equatings 

Lord (1980a,. chapter 13) states that two tests cannot be equated 
unle.ss"they are perfectly reliable or- strictly parallel. The first case 



lU'j 




85 



^ Table Al 

Verbal Equatings ^ 
Means» Standard Deviations, and Skewnesses 



Forms 



Va riant 




3CGR1 




K-ZGR3 






K-ZGR2 






ZGRl 






Mean 


S.D. 


Skew' 


Mean 


S.D. 


Skew 


Mean 


S.D. 


Skew 


Mean 


S.D. 


Skew 


lES 


473.27 


125.14 


.14 




* 


* 


* 




* 


* 


* 




lESV 


475.80 


123.39 


.13 






* 


* 




* 


* 




* 


lESH 


473.39 


126.51 


.15 






* 


* 


* 


* 


* 






lEP 


473.81 


125.47 


.18 


* 


* 


* 


* 


* 


* 


* 




* 


lAL 


* 


* 




504.93 


122.19 


.08 


496.68 


125. i4 


.05 


3UU . ol 


1 '>Q HA 

1 Zo . uo 




lALV 


* 


* 


* 


506.26 


119.40 


.12 


500.46 


120.12 


.04 


502.98 


124.65 


.02 


lALH 


* 


* 


* 


504.5.4 


122.58 


.11 


498.66 


123.30 


.05 


501.26 


127.78 


.02 


IAL2 


* 


* 


* 


504.22 


122.13 


.08- 


* 


* 


* 


* 




* 


I AW 


* 


* 


* 


504.66 


123.23 


.14 


503.18 


125.66 


.08 


* 


* 


* 


EP 


473.29 


123.30 


.20 


507.70 


124.23 


.03 


* 


* 


* 


* 


* 


* 


LE 


473.29 


I2S.35 


.10 


507.70 


124.20 


.02 


* 




* 


* 


* 


* 


LA 


* 


* 


* 


* 


* 


* 


502. Ifc 


126.26 


-.01 


501.69 


126.75 


.02 



^The ceils in this table in which asterisks appear represent equatings that were not 
carried out in this study. 



10. 



Figure 16 



VERBAL EQUATING GRAPHS - FORM ZGRl 



900 




-20 



20 40 
FORMULA SCORE 



V 



Figure 17 



900 

C800 
0 

N 

V 700 

E 

R 

T 600 
E 

^500 
S 

C 400 
0 

R 

E300 
S 



200 - 



100 



VERBAL EQUATING GRAPHS - FORM K-ZGR2 



0 



T 



T 



T 




20 40 
FORMULA SCORE 



60 



10,, 



Figure 18 



VERBAL EQUATING GRAPHS - FORM K-ZGR3 " 




FORMULA SCORE 

10 1 

ERIC 



Figure 19 




20 40 
FORMULA SCORE 



80 



ERIC 



lU 



Figure 20 

_ Jl ^ 

VERBAL EQUATING RESIDUALS GRAPH /- FORM ZGR1 

n r \ n i • i / i i i F] 




lAL 

lALH 

lALV 

LA 

.1111 \ \ I I \ ^ 

-20 0 20 40 60 ^ 80 

FORMULA SCORE 



H SEE TEXT 



F igure 21 ^ 

VERBAL EQUATING RESIDl/ALS GRAPH - FORM K-ZGR2 



20 
15 
10 

5 
0 
-5 
■10 



/ 



f 



-20 



T 



/ 



/ 



0 



"20 



lAL 

lAW 
lALH 

— lALV 

LA 

.__J L 



40 

FORMULA SCORE 



60 



80 



H SEE TEXT 



ERIC 



20 
15 

.10 
\5 
0 
-5 

-10 
-15 
-20 



^ Figure 22 

VERBAL EQUATING RESIDUALS GRAPH - FORM K-ZGR3 



T 



T 



T 



T 



T 



T 



Hi t \}\\ \ \ \\ i>xC i y 1 1 1 1 1 i 1 1 1 1 iJtiJ'PtA' M M 1 1 1 1 1 1 1 " I M |J 1 n I ' 

» / " r ' — — " lAL ' 



— iiiinini ii iiii i iihi 



* _ / 




/ 

/ 

/ 

/ 

/ 



'I / 



i 



, JAW ' 

•» i 1 1 1 1 1 1 1 M IAL2 

> . lALH 

lALV 

------ EE 

LE 

J J- -1 



0 



20 ; 40 
FORMULA SCORE 



60 



80 



^ SEE TEXT 



10^ 



ERIC 



V) 
LJ 

a: 
o 
o 
(/) 

UJ 

I- 

LJ 
> 

o 
o 



UJ 

u 

LjJ 

OQ 

LJ 
O 

LJ 
CH 
LJ 



H 



20 
15 
10 

5 
0 
-5 
-10 
-15 
-20 



Figure 23 

VERBAL EQUATING RESIDUALS GRAPH - FORM 3C6R1 



r 

lEP 

lES 

lESH 

lESV 

EE 

LE 



I 
t 
I 
I 
I 
I 

I 



\ 

\ 

r 

\ 



i 



i 



i 



i 



i 



20 40 
FORMULA SCORE 



60 



80 



H SEE TEXT 



ERIC 



10 J 



94 



Table 42 



Quantitative Equatings ^ 
Means, Standard Deviations, and Skewnesses 



Equating 
Variant 



lES 

lEP 

lAL 

IAL2 

EE 

LE 

LA 



Forms " 




3CGR1 




K 


T'^r 

-ZGR3 






K-ZGR2 






ZGRl 




Mean 


S.D.' 


Skew 


Mean 


S.D. 


Skew 


Mean 


S.D. 


Skew 


Mean 


S.D. 


Skew 


499.75 


123.38 


.15 


ic 


ic 


ic 


ic 


* 




* 


ic 


* 


494.81 


123.65 


.12 


ic 


ic 


ic 


ic 


ic 




ic 


ic 


ic 


* 


* 


* 


493.18 


128.91 


.04 


530.09 


127.48 


-.11 


526.55 


133.75 


-.10 


* 


* 


* 


492.98 


130.75 


.04 . 




ic 


ic 


ic 


ic 


ic 


498.65 


130.39 


.01 


ic 


ic 


ic 


ic 


ic 


ic 


ic 


ic 


ic 


498.63 


130.31 


.17 


486.06 


134.94 


.18 


ic 


ic 


ic 


ic 


ic 


ic 


* 


* 




ic 


ic 


ic 


525.55 


133.33 


-.01 


524.50 


133.47 


-.07 



^The cells in this^ table in which asterisks appear represent equatings that were not 
carried out in this study. 



> 



o 

ERIC 



Figure 26 



QUANTITATIVE EQUATING GRAPHS - FORM K-ZGR3 



900 

C800 
0 

•^700 
V 

E ' 
r600 

T 

^500 
D 

S400 
C 

0 300 
R 

1 200 
100 

0 




I I I III 



J_J \ L 



-10 



10 20 30 

FORMULA SCORE 



40 



50 



ERIC 



11.) 



Figure 27 




Figure 28 



V) 
UJ 

(Y. 
O 
O 
V) 

LiJ 

UJ 

> 
z 
o 
o 



UJ 
UJ 



UJ 

QQ 

UJ 
O 
Z 
UJ 

a: 

UJ 



H 



30 
25 
20 
15 h 



QUANTITATIVE EQUATING RESIDUALS GRAPH - FORM ZGRl 



1 — \ r 



1 — I — T r 



T \ \ \ T 



1 



ok 

* 

5 - 
0 



-5 
■10 
-15 1- 
-20 
-25 
-30 



lAL 
LA 



J___J L 



I I 1 \ \ \ L 



10 



0 



10 20 30 

FORMULA SCORE 



40 



50 



H SEE TEXT 



Figure 29 r 

- f, , 

QUANTITATIVE EQUATING RESIDUALS GRAPH - FORM K-ZGR2 
I I I I I I I I I I , I I I I 
15 - . . 

1 0 - 



i \ 



/ \ / 



-5- / \ 



'Op \ / 

' J 

15 1- 



\ 



/ ^ 

/ : TAL \ 



I \ I : lAL 

■Ny \ / LA - i 

\ / ^ I . H 



20 lJ J L_J I I.._lJ [ L L__tl_41 L 

- 1 0 0 10 20 30 40 50 



FORMULA SCORE ■/ 



*■ SEE TEXT 



llu 



Figure 30 




Figure 31 



QUANTITATIVE EQUATING RESIDUALS GRAPH - FORM 3CGR 1 



30 
25 
20 
15 
10 
5 

V 

0 
-5 
-10 
-15 
-20 
-25 
'30 



1 — I r 



\ 



1 \ r 



1 r 



X 



T 



I 
I 
\ 
I 



^1 . 

I 



\ 



V 



-10 0 
n SEE TEXT 



•s 



\ 



/ \ 
/--. . \ 

// V V 

7 "V'. 

^ y \ — »— ^ 

/ . » 

\/' / \ 

^ lEP \ 

lES \ 

EE ^ 

LE \ 

\ 

J J .1 L__J \ J I L_LL 

10 .20 30 40 50 

FORMULA SCORE 

n ■ 



ERIC 



Table 43 ^ 
Analytical Equatings 
Means, Standard Deviations, and Skewnessess 



Equating 
Variant 



Form 



3CGR1 



Skew 




Figure 32 

t 

ANALYTICAL EQUATING GRAPHS - FORM 3CGR1 



900 



T 



T 



T 



T 



T 



C800 
0 

N 

V 700 

E 

R 

T600 
E 

^500 
S 

^400 
0 

R 

E300 
S 

' 200 



too 




20 40 
FORMULA SCORE 



ERIC 



UJ 

a: 
o 
o 

V) 
Ixl 

»- 

UJ 

> 
z 
o 
o 



UJ 
Ul 

H- 
UJ 



50 
40 
30 
20 
10 
0 

-10 



UJ -v?0 



u 

a: 

UJ 

u. 

U- 
M 



-30 
-40 
-50 



Figure 33 • 



ANALYTICAL EQUATING RESIDUALS GRAPH - F0RM/3CGRI 



\ 



» SEE TEXT 



o 

ERIC 



1 



J.. 



J 

20 40 

FORMULA SCORE 
12, 



lEP 
lES 
EE 
LE 



60 



■ . '106 

is not possible and in the second case, eqi^uating is not necessary. Assuming 
that we never have strictly parallel tests (and this assumption will be 
made throughout the rest of this chapter), and given the impossibility of 
equating fallible tests, one can still attempt to adjust scores as , 
equitabl^ as possible. The various equating models examined as part of 
this resiearch are based on a variety of assumptions and axe affected by a 
variety of factors. In order to judge the operational feasibility of IRT 
equating it is important to consider these factors and their potential 
though unknown effects on IRT, linear, aAf* equiperoentile equating methods 
an4 the equivalent-group and anchor-test*' data, collection designs. 

All equating, as mentioned previously, requires perfectly reliable 
tests. Additionally, all equating methods require that the tests to be 
equated are unidimensional (Morris, in press). How then do other assump- 
tions (and the potential effect of violation of thege assumptions) differ 
for IRT, equipercenfcile, and linear equating models? 

Violation of the assumption of unidimensionality might lead to more 
serious consequences for IRT equating than for linear or equipercentile. 
This is because IRT is a stronger, more specific models that is, IRT 
asfeuraes ,urtidimensionality explicitly at the item level. In contrast, 
all that is required for linear and. equipercentile equating is unidimen- . 
sionality at the test level. Each, 'however , requires unidimensionality in 
order to establish a single Unambiguous, constant metri^. Thus, the 
possible difference in effect of the viola^tion of "unidimensionality is 
unclea^. \ * ' * 

Some equating problems are based on the constraints of available 
data. The sparseness of data for low ability examinees makes it difficult 
to estimate the pseudoguessing parameter. Lack of appropriate data can 
also make it difficult' to estimate the discrimination and difficulty 
parameters of very easy or very difficult items. Additionally, items that 
discriminate very poorly have poorly . determihed difficulty parameters 
(Kingston and Dorans, 1981, glve'-an example of an item with parameters 
estimated on two .samples of over 1,500, examinees where the estimate of b 
varied vfrom more than +1.5' to less than -1.5). Similarly, equipercentile 
equating frequently snuffers from a sparseness of data at the extremes of 
.the score scale, which can lead to poor equating at tho^e extremes. To a 
lesser extent, linear equating can be affected by outlying values having 
an undue influence on the mean and standard deviation. With the sample 
sizes typically used in equating the GRE Aptitude Test, however, this does 
not cause any difficulties. 

Though Lord has shown th^t equating nonparallel tests requires 
perfect reliability, different equatings are probably differently affected 
by both imperfeQt reliability and differences in reliability between old 
and new te^t forms. It is likely that equating methods based on true 
score estimates (whether they are based on IRT or classical methods) are 
less adversely affected, at least by differences in reliability. 

' Even if a* lack of parallelism, betweea test forms is attributable 
solely to differences^ in item difficulties and/or discriminations (and is 



1^2 



lu7 



unrelated to mult idimensionality) , different Equating metiiods will be 
influenced differently. Lack of s tat*i!»#fc^al paralleiisni between forms 
results in a curvilinear equating relationship. We know that we cannot 
produce strictly parallel tests and that^ If we could, equating would be 
unnecessary (Lord, 19b0a). Thds, It Is clear tha^^ can 
never precisely define the relationship between test scores on different 
forms. In many circumstances the departures linearity appear minor, 

but, as test forms become less, paralli^l, linear equating becomes less 
appropriate. Jaeger (1981) pre^?eftts some /^i:^ indices for investi- 

gating whether -linear or equipereentile jtte:thpd^ a^ for 
equating. , ' < - ' \ ' 

Just as sparseness of data at 'the lextr^iafes p practical 
problem for some equating methods, 'discreteness of data can present 
estimation problems (Braun & Holland/ ifi pre&S;j P^ iti press). 

Morris (in press) suggested linear .equating nii^ht '^^^^^^^^^W equi- 
percentile equating if there are /'too fifew" it^i!a^> but ciid not define "too 
few." J^otthoft (in press) suggested not ropndtn^ £ormu-J.a sqores before 
equipercentile equating or 14 sing IRT feased ife^uating to ayoid ^rool^ 
caused by data discreteness. - ■ ^ 

•--r? Data collection designs necessitated hy adtainlsirative complexities 
* xan lead to other problems with equating. The anchorrtest design allows 
one to adjust for differences in exaniinee ability. There is evidence, 
however, that as the difference in ability between the two groups becomes 
larger, the quality of the equating based on the anchor design decreases 
(Marco, et al. , 1979; Petersen, et al. in press) . Since IRT equating is 
based on item parameters that are invariant with respect to examinee 
I ability, it may be wore resistant to this problem. This is supported oy 
the Marco, et al. results. 

The equivalent-group design, as it is typically used, based on practical 
considerations, also presents a problem. When an old and new form are 
spiralled, the old form has previously been exposed. Some of the ex^aminees 
may have previously taken the old form and thus might be expected to 
perform better than their fellow examinees who have either taken a different 
old form or have not previously taken the test. Examinees taking tne new 
form cannot experience a comparable benefit. Thus mean scores, to some 
small extent, may be artificially high on the old form compared to the new 
form and might consistently make the old fprm seem easier (although 
probably to an unnoticeably small extent) than it is. Such a systematic 
bias might lead to an eventual scale drift. 

IRT based equating, as we have chosen to implement it, is not affected 
by speededness in the same way as are linear and equipercentile equating. 
To minimize mult idimensionality , contiguous items to which an examinee has 
made no response and which appear at the end of a separately timed section 
were coded as **not-reached** and were not used in the estimation of the 
examinee's ability. Likewise, these -**not-reached** items were not used 
in estimating the parameters of the items. Thus, the IRT metnod attempts 
to equate a more unidimensional ability metric. Since equating (as 
commonly used) provides a scaled score that is a function of an observed 



123 



score, and since these observed scores tiave variance due to speededness, 
IRT equating based on item data including "not-reached** items might be 
subject to some problems that do not affect classical equating methods* 
If two forms of a test differ in speededness, IRT based equating, 
inappropriately, will not reflect this. The resulting bias in equating 
should be trivial if the variance due to the speed factor is very small 
compared to the variance due to the power factor or if the difference is 
speededness is quite small. 

Verbal equatings * Table 41 shows that most verbal equatings produced 
similar results. Several findings are notable. As mentioned earlier, 
separate calibrations of discrete verbal and reading comprehension item 
sets were performed to inves tigate. dimensionality . For both the equivalent- 
group design (lES, lESV, lESH) and the anchor-test design (lAL, lALV, 
lALH) the effect of mult idimensionality and item calibration design was 
further investigated with three equatings for each test form (see the 
equating method section of this chapter for greater detail). If tne 
verbal section of the GKH was perfectly unidimensional or it IRT equating 
was highly robust to violations of unidimensionality , there wouuld be no 
systematic differences among the three equatings; the only "differences 
would be due to sampling error. If dimensionality is a factor, one would ' 
expect the lES and lESV (or lAL and lALV) equatings to be more different 
from each other than from the lESH (or lALH) equating. Exanfination of 
Table 4i shows this to be the case. Surprisingly, there is very little 
difference between the lES ana lESH equatings and lAL and lALH equatings. 
The difference between means based on the two equatings for forms 3CGRi, 
K-ZGR3, and ZGrI are .12, .3b^, and .45, respectively. The difference 
between standard deviations is somewhat larger for one of the three forms: 
1.J7 versus .39 and .26. Form K-2GR2 shows a somewhat larger discrepancy: 
i.98 for the means and I.b4 for the standard deviations. 

Korm ZGRI allows the most straightforward assessment of lAL, lALV, 
and lALH equating. In this one case, the LA equating is a true criterion 
since form ZGRI has been equated to itself, and the LA statistics are 
based on given (and, for our purposes, we can assume arbitrary) scaling 
parameters that are also part of the lAL, lALV , and lALH equatings. The 
lALH equating is in closest agreement with the La "scaling." This mignt 
simply be due to differential sampling fluctuations in the item parameter 
estimation procedure (almost but not quite identical samples were used 
in the three calibrations, see pages Itt through 31) but the possible 
superiority of the equatings based on homogeneous subsets aeserves 
further investigation. 

The lEP equating has summary statistics quite similar to the lES 
equating and not quite as similar to the LE and EE equatings. The lEP 
equating is based on a stronger parameter lilting than the lES equating 
(spiralling versus a single LOGIST run), but the lEP estimates are potentially 
subject to a practice or item position effect. Kingston and Dorans (1962) 
have shown that the position of GRE verbal items when administered has 
no systematic effect on item parameter estimates. Several factqrs could 
be responsible for the differences (tliough relatively small) between tne 
lES and lEP and the LE and EE equating results. Thougn the relative 



109 



efficiency graphs (see Appendix B, figures B.Aa through B.4d) do not show 
evidence of a lacli of parallelism, form ZGRl was more speeded than form 
3CGR1 (80 percent of the examinees taking a spiralled subform (C47) of 
ZGRl reached 61 items; 80 percent of the examinees taking a similar 
subform (C41) of 3CGR1 reached 65 items). Unlike the equatings for forms 
K-ZGR3 and K-ZGR2 , the equatings (IRT, linear, and equipercentile ) for 
form 3CGR1 were all based on samples from the same data (EE and LE were 
based on identical data; lES, lESV, and lESH were all based on an almost 
identical subset of the EE, LE data; lEP was based on an essentially 
random one half of the lES sample). *« 

^The lAL and IAL2 equatings of form K-ZGR3 have very similar summary 
statistics. The minor differences (.71 between means, .06 between standard 
deviations) are a result of the extra link in the parameter scaling using 
the IAL2 method. It is encouraging to see that these differences are 
smalLl Ignoring the lALV and IAL2 equatings since there is no theoretical 
reason for ey^T^ref ering them, the means and standard .deviations for the 
IRT equatings (lAtT^J^ALH, lAW) are more^ similar to each other than they 
are to those of the L^^^and EE' equating^. Much of this difference migl^t be 
attributable to differences in the groups on which the equating data are 
based. Hie three IRT eqViatings were based on data from a different group 
of examinees than that available for the LE and EE equatings. It should 
be noted that the .95 conJ&idence interval of the LE equating is no smaller 
than +2.16 scaled score po\nts at its smallest point, the mean of the 
distribution (based on dataXgiven in Stewart, 1981 )• 

The results of the^-Z-GR2 verbal equatings are less clearcut. The 
means based on each method differ from all other means by at least 1.02 
and range from 496.68UIAL) to 503.18 (lAW). The standard deviations 
(ignoring lALV) range \from 123.30 (lALH) to 126.26 (LA). The two most 
similar results are forNIAW and LA (difference in means was 1.02, difference 
in standard deviations waV<^(7)"r ^ 



Quantitative equatings . The quantitative equatingSL, as compared 
using the means and standard deviations given in Table 42, appear to be 
less similar than the verbal equatings. For form ZGRl, the linear equating 
parameters from which the LA data were derived are part of the scaling for 
the lAL equating. Thus, we would expect to reproduce the LA mean and 
standard deviation quite closely. The. di f f e rence in standard devations 
(.28) is acceptably small. The difference in means appears somewKat large 
(2.05). Unfortunately, we do not have an estimate for the standard error 
of equating for the lAL method to help put these differences in perspective. 

All four quantitative equatings performed on form 3CGR1 were based on 
data from the same administration. The lES mean was not so different from 
the EE and LE means (1.12), but the lES and lEP means differed by 4.94 
scaled score points. Even more striking is the difference between the IRT 
based standard deviations and the EE and LE based standard deviations, 
approximately 7 scaled-score points. While the parameter estimates for 
the 3CGR1 item's were based on samples of only about 2,500 for the lEP 
equating and about 5,000 examinees for the lES equating, it seems unlikely 
that these differences can be attributed solely to sampling fluctuation in 



er|c 



l2o 



iiu 



the parameter estimation process. Although the difference in means 
between lEP and lES is in the direction that would be expected it ttiere 
were a practice effect (items being easier when calibraJt^d in ttie fifth 
section than when calibrated in the operational section), Kingston and 
Dorans (1981) investigated >pis^tice effect on the it^m level and .found no 
evidence supporting this hypothesis for quantitative items. 

P'igure 27 compares thes. equating lines for the various methods used on 
3CGR1 quantitative. The most stj^iWrng^-teauit, is the marked curvilinearity 
of the IRT equatings. The EH/^quating is also quite nonlinear, alttiougti 
not as rauch as the IRT equatitigs. The relative efficiency curves provide 
direct evidence of marked noroarallelism of these two forms (Appendix 
B, Figures b.ba and B.bb). \ln addition, examination of the formula 
raw score data for spiralled samples based on subforms C49 (ZGRi) and 
C44 (3CGR1) provides evidence of differential speededness. On ZGRi, ttU 
percent of the examinees reached item 5U, on 3CGR1 , bU percent of the 
examinees reached item 46. Similarly, on ZGRi, only 50. i percent of the 
examinees completed tne test wtiile, on 3CoKi, only 34.8 percent finished 
the test. These results must be considered in ligiit of ttie difficulty of 
the two forms. The mean raw score of the* ZGRI sample was 2^.59, in ttie 
3CGRi sample, it was 24.52. Thus, since the forms contained the same 
numoer of items, the forms are of different speededness, and tnis might 
bias the lES and IKP equatings. 

Results for the K-ZGR3 and K:-ZGR2 equatings are also difficult to . 
interpret. The means and standard deviations based on ttie IRT equatings 
differ from the results of the linear equatings. For the IRT equatings, 
we know there are potential problems with dimensionality and model fit. 
For the iC-ZGR3 quantitative equating, LE is really a complex combination 
ot equatings. The base of that series was the equating of the original 
K-ZGR3 to ZGRI. Figure B.7a provides evidence that these forms have 
markedly nonparallel quantitative sections. This explains ttie curvi- 
linearity of the IRT equatings for K-ZoR3, and the consistency of this 
nonparallelism for quantitative forms suggests ttiat tiie appropriateness 
of linear equating for the quantitative section of the GRE should be 
further investigated. 

Analytical equatings . Statistics based on the analytical equatings 
of form 3CGR1 are presented in Table 43. The raost noticeable result is 
the extremely low mean based on ttie lEP equating. This difference ot 
27 .07 points between the ILP mean and the LK (the least different) mean is 
due to practice effect, most noticeably on the analysis of explanations 
items. This effect is more fully documented by Kingston and Dorans 
(1981). 

The mean and standard deviation for the IliS equating are somewhat 
different from those for LE and EE (.75 and 3.51 between lES and EE). 
The relative efficiency graph (B.9a) and the curvilinearity of both the 
lES and EE equatings suggest that the LE equating is not appropriate 
because of the nonparallelism of the two fonas. Problems with t*he model 
fit of analysis of explanations items and the complex factor structure of 



12G 



Ill 



the analytical section further complicate th^ Interpretation of these 
results. 

Shifts in dimensionality . A general consideration for Interpreting 
the results of GRE equatlngs is the possibility of shifts in the dimen- 
sional characteristics of the test sections due to nonraHdom choice of 
administration dates Vy markedly different types of students. Mathematics 
and science oriented fftudents tend to tak^ the GRE Aptitude Test in the 
fall while social science and education stud^ents tend to take the test in 
the spring. It is likely, to the extent that this difference in factor 
structures across administrations exists, that all equating methods will 
be somewhat affected, although perhaps to different degrees. 



\ 



12 



112 



SUMMARY, DISCUSSION, AND '^CUMMENOAT IONS 

The research reported here is based on tne GRE Aptitude Test as it 
was structured during the period from December 1979 through June 19bU. 
At an e^rly stage of this research it was decided that the analytical^ 
section would soon undergo substantial revision. Consequently, this 
research focuses on the verbal and quantitative sections. Moreover, 
in October 1981 the verbal and quantitative sections and the general 
structure of the entire GRE Aptitude Test were revised. Factors from 
this restructuring that are most likely to affect the use of item response 
theoty are th« increase in the tine-pe r-item allowance, changes in the 
relative proportions of certain item types, and the shift from formula to 
rights only scoring. It is difficult to forecast tne' exact effects of 
these changes. Recommendations to be presented will be influenced by 
expectations about the effects of these changes. 

This final section of the report summarizes the findings of the 
various portions of the research, and then synthesizes these findings. 
The topics to be summarized are; the basic assumptions of item response 
theory, implications of previous factor analytic research conducted on 
the GRE Aptitude Test, assessment of the weak form of /local independence, - 
analysis of item-ability regressions, temporal stability of item parameter 
estimates, sensitivity of parameter estimates to violations of unidimen- 
»ionality, and comparisons of item response theory equating with equi- 
percentile and operational linear equating. 

Sumia ary 

The basic as sumptions of item response theory . One of the major 
assumptions of item response theory is that performance on a set of items 
is unidimens ional , i.e., the probability of successful pertorinance by 
examinees on a set of items can be modeled by a mathematical model with 
only one ability parameter. A second major assumption is that tne proba- 
-bility of successful performance on an item can be adequately described by 
the three^parame ter logistic model, a particular item response theory 
model that seems particularly applicable to binary-scored multiple-choice 
items. 

One consequence of the unidimensionality assumption is the mathematical 
concept of local' independence. The weak form of local independence, which 
was assessed in this research, states that item responses are uncorrelated 
at fixed levels of ability, i.e., after taking ability into account, tnere 
are no systematic shared influences on item performance. 

Implications of previous factor analytic ' research on the GRE Aptitude 
Te s t . Four factor analytic research studies conducted on the GRE Aptitude 
Test were reviewed in order to assess the dimensionality of the test, to 
identify sets of homogeneous items, and to extract hypotheses about the 
GRE Aptitude Test that could be tested in other phases of this research. 
The four factor analytic studies provided strong evidence' for the existence 
of three large global factors: general quantitative ability, reading 



12'6 



113 



comprehension or general >ve rbal reasoning ability, and vocabulary or 
discrete verbal ability. In addition, the factor analytic studies provided 
evidence for the existence of several smaller factors: a data ihterpreta- 
tion factor, a technical reading comprehension factor, and a spe^<t^actor 
on the verbal scale. 



separat sd 



purpo ses 
suggest id 



S(it 



As a consequence of these studies, verbal items were s 
reading comprehension set and a discrete verbal set for the 
item response theory analyses. However, the studies also s 
separation of the data interpretation items from other quantitative 
and the further breakdown of reading comprehension items into a 
technical reading comprehensioh items and a set of other reading 
hension items. Doubts about the practical significance of these 
dimensions, coupled with the fact that there were too few items 
stable linking of ability scales through item response theory 
estimates, led to the conclusion that the construction of separat 
interpretation and reading comprehension scales was not feasible, 
the current structure of the GRE** Aptitude Test. 



to 



iteii" 



into a 
of 

items 
of 



compre- 

1 smaller 
yield 

ndif ficulty 

2 data 
given 



Assessment of the weak form of local independence . The weakl form 
of local independence states that, for a given ability level, ite|n responses 
are uncorrelated . This local independence condition was assessed, via the 
examination of item i ntercorrelations with estimated ability partialled 
out. Partial correlations both with and without a correction forj v^n^fiA 
were examined. 



guessing 



The analysis of partial correlations for the verbal subtest uncovered 
two systematic sources of local independence violations: a reading 
comprehension factor and speededness. The analysis orf. partial cc rrelat ions , 
for the quantitative test revealed that the data interpretation 1 tems • 
retained positive^ intercorrelations after overall quantitative ability was 
partialled out, thus providing^ evidence for another source of local 
independence violations. In sum, the partial correlation analyses 
produced findings consistent with expectations based on the previous 
factor analytic studies. 

Analysis of item-^abllity regressions . The item response funttion of 
item response theory can be viewed as a theoretical form for the degression 
of item scored - a correct response, 0 - an incorrect response) jonto 
underlying ability. Actual item performance for each ability levjl can be 
obtained from the data and plotted for, various levels of^ ability do obtain 
an empirical "item-ability regression. &mparison8 of estimated itfem 
response functions to actual item-ability regressions enable one tb 
assess the fit of the three-parameter logistic^ model to the data, lA 
graphical technique, referred" to as analysis of item-ability regressions, 
was devised to assess fit via these comparisons of estimated and eiApirical 
item-ability regressions. ' I 



WIS 



On the basis of the analysis of item-ability regressions, it 
determined that all of the verbal item types and two of the analyti:al 
item types, logical diagrams and analytical reasoning, seemed to be 
better by the three-parameter logistic model than the three quantitative 



I2j 



Item types and the analytical analysis of explanations item type. Of 
these latter four item types, regular mathematics and data interpretation 
items seemed to be fit only a little less well than some of the better- * 
fitted item types^ Quantitative items were the most difficult items fo-r 
the three-parameter JTogistic model to fit. Analysis of explanations items 
keyed other than B or E were vfit by the model quite well, but those keyed 
B or E had the highest proportion of model fit scores that indicate poorer 
fit of any of the item classifications under study. 

T emporal stability ot item parameter estimates . Theoretically , an 
item response function for an item should not be affected by when the item 
was administered, provided a "common ability metric has been established. 
The section on parameter estimation and item linking procedures described 
the procedures used to place all item parame ter -estimates on the same 
scale. The dual administrations of Form ZGRl, once in February 1960 and 
once in June 1980, enabled us to assess the temporal stability of item 
parameter estimates. 

For tne discrete verbal items, the item difficulty parameter, o , 
the item discrimination parameter, a , and the item response functioS 
derived estimate of conventional iteS difficulty, p , all exhibited 
much temporal stability. The psuedoguessing parameter, which is the 
most difficult parameter to estimate, exhibited less Temporal stability. 

For the reading c!:omprehens ion items, b , a and p all exhibited 
much temporal stability. The c es t imates , ^howive r , wire much more 
sensitive to administration dati. 

All quantitative items had very stable item parameter (a , b^, and 
c ) estimates, and very similar conventional item difficulty Istifeates, 
p^, over time. 

Sensitivity oi paramete r^ est jmates to violations of unidimensiouality . 
Lvidence indicating that ve rbfal items are not homogeneous, i.e., ttx^t they 
measure more than one dimension, was presented in the" factor analytic 
review, the assessment of local independence, and the item-ability regres- 
sions. Comparisons of item parameter estimates based on calibration of 
heterogeneous sets (all verbal items) and homogeneous sets (discrete 
verbal only or redding comprehension only) were suggested by these earlier 
results. 

Discrete verbal and all verbal calibrations of discrete verbal 
items produced considerably more similiar estimates of item discrimination 
than the reading comprehension and all verbal caiiorations of reading 
comprehension items. The discrete verbal and all verbal calibrations 
prbduced slightly more similiar estimates of item difficulty, b , tor the 
discrete verbal item tnan the reading comprehension item estimates of b 
produced by the reading coraprehens ion and all verbal calibrations. Whefi 
compared to the results for a estimates, the b estimates exhibited much 
less sensitivity to horaogeneify ot item sets. ^ 



130 



lis 



With the exception of the c estimates of the discrete verbal items 

of form K-ZGR2, the c estimates^appeared fair ly^ robust to heterogeneity 

of item calibration sit. The exceptional results obtained for the discrete 

verbal items of form K-ZGR2 were an artifact produced by the choice of 

constraints used by LOGIST to estimate c for items that are deemed too 

easy to provide well-determined estimate! of c . Compared to a and b 

- estimates, however, the. c estimates ref lected^greate r sensitivfty- to ftem 

heterogeneity, a result pirtly reflecting difficulties inherent in obtaining 

stable estimates of c ^ 

g ' . 

The similarity of p estimates based on heterogeneous versus 
homogeneous caiibrations^was very high. An inference sug^gested by tnis 
high degree of similarity is that the observed data can be approximated ^ 
equally as well by sets of heterogeneous items (all verbal) as by sets of 
homogeneous items (discrete verbal, or reading comprehension). 

Comparabil ity of ability estimates based on homogeneous and hetero - 
geneous sets of items . All verbal items were ^calibrated at least twice, 
once with a set of homogeneous items of like type, e.g., discrete verbal 
or reading comprehension, and once with a set of heterogeneous items 
comprised of both discrete verbal and reading comprehension items. 
' This procedure produced three ability scores for each examinee verbal 
ability score tased on all verbal items, a discrete verbal ability 
score based on discrete verbal items, and a reading comprehension score 
based. on reading comprehension items. Correlations among these ability* 
estimates anc} among proportion-correct true scores based on these ability 
estimates provided evidence £or the existence of two distinct, highly 
correlated reading comprehension and' discrete verbal abilities. Evidence 
was also provided forv thinking of the /overall verbal ability score as a 
we ighted composite of the discrete verbal and reading comprehension 
'abilities. Although the overall verbal ability score appears to have 
resulted from LOGIST being drawn toward the discrete verbal dimension 
during parameter estimation iterations, the correlations it has with the 
discrete verbal and reading Comprehension abilities are consistent 
with the correlations one would expect if the overall verbal proportion- 
^ correct true score were defined, as a weighted composite of the discrete 
verbal and reading comprehension true scores, where the weights were 
relative number, of discrete verbal and reading comprehension iteins, 
respectively. Of course, the correlation between discrete verbal and 
reading comprehension abilities is high enough to ehsure that any set of 
positive weighting coefficients would produce a composite dimension that 
was proximate to the verbal dimension. In sum, the evidence provided 
supf>ort for the existence of two distinct, highly correlated discrete 
verbal and reading comprehension abi}.ities that can be combined ^o, produce 
a composite ability that closely resembles the general verbal ability 
dimension defined by LOGIST. 

Equating comparisons . A statistical equating method is an eupiricai 
procedure for determining a transformation to be applied to the scores on 
one form to produce scores that are on the same scale as the other form* 
As such it consists of two parts, a data collection design and a set of 
rules for determining the transformation. Two data collection designs 



131 



lib 



(equivalent group and anchor test) and three "general statistical methods 
of equating (equipe rcent ile equating, linear equating, and item response 
theory based true score equating) were used in this research. 

i 

In general IRT equating methods seemed to give reasonable results for. 
the verbal equatings. The results for the quantitative section equatings 
are mc^e questionable for several reasons: the relatively poor model fit 
of the q'uant itative items, particularly quantitative comparison items, and 
the possible shifts in dimens ionality due to nonrandom choice of adminis- 
tration dates by markedly different types of students. That is, mathematics 
and science oriented students tend to take the GRE Aptitude Test in the 
fall and social science and education students tend, to take the test in 
the spring. Results for the analytical saction are marked- by the large 
practice effect for lEF equating. The lES equating seems reasonable. 



Synthes is 

The major purposes of this research were to address the reason- 
ableness of the assumptions of item response theory and the robustness 
of item response theory methods (applied to the GRE Aptitude Test) to 
violations of these assumptions. The resegirch was motivated by a need to 
address the psychometric feasibility o^ applying IRT methods to the GRE 
Aptitude Test items and populations.' Test disclosure legislation and its 
etfects on operational equating strategies served as a major impetus for 
the need to address psychometric feasibility. Ifr applicable to the GRE 
Aptitude Test, item response theory would provide powerful, flexible tools, 
for in-depth analysis of test forms and items, the maintenance of score 
scales via equating, and the development of better and more efficient test 
forms that could be tailored to fit specific needs. 

Fit of item re sponse theory model to the GRE Aptitude Test items and 
examinee populations . Any evaluation of the fit of a mathematical model 
to TaTa^ shouTcT be Inade from a realistic point of view that recognizes that 
ail models are the products of human minds that attempt to understand 
and predict phenomena. As such, models never completely tit the data. 
Fit is a matter of degree. * 

The three-parameter logistic model seems to fit the GRE Aptitude 
Test data reasonably well for verbal and less well for quantitative and 
analytical. Evidence exists for the violation of local independence on 
ail three scales of the test. On the verbal scale, the factors underlying 
reading comprehension items, particularly technical reading comprehension 
items, and speedednesB contribute to the lack of fit of the three-parameter 
logistic model to verbal items. Despite the existence of these sources of 
local independence violations, the model fits all verbal items reasonably 
well, as evidenced in the item-ability regression analysis, the relative 
insensitivity of item parameter estimates to homogeneity of item parameter 
estimation sets, and the verbal equating results. The shift to number 
right scoring will probably not enhance the fit of the three-parameter 
logistic model to verbal item types. The increased time per item should 



117 ' 



diminish discrepancies between IRT and other equatings when forms are 
differentially speeded. 

On the quantitative scale, the data interpretation items were influenced 
by some systematic source of local independence violations, as evidenced 
in the chapters 6n the factor analysis review and the assessment of the 
weak form of local independence • The item-ability regression analyses and 
the equating results demonstrated that the three-parameter logistic modeT 
does not fit the quantitative items as well it fits the verbal items. The 
quantitative comparison item type was the most difficult item type to fit; 
there were some instances of marked nonmonotonicity of empirical item- 
ability regressions for this item type. The relative lack of statistical 
parallelism of the quantitative tests probably contributed to the greater 
dissimiliarity between scaled score distributions produced by the IRT 
methods and those produced by the operational linear method. 

The three-parameter model fits the verbal items better than the 
quantitative iteus despite the fact that the dimensionality analyses 
appear to indicate that dimensionality is a greater problem with the 
ve,rbal item types than with the quantitative item types. 

Application of the common factor model, a linear model, to the GRE 
Aptitude Test, clearly identified two major verbal dimensions, reading 
comprehension and discrete verbal, as well' as some minor dimensions. On 
the other hand, factor analyses of the quantitative it^ms did not produce 
two clearly defined major dimensions. Perhaps, however, the subtle 
dimensionality problems implied by the item-ability regression analysis 
present a greater problem for the quantitative scale than does the grosser 
mult idimensionality of the verbal scale. The verbal scale appears to be ^ 
composed of two clearly defined, highly correlated dimensions that are' 
amenable to modelling by a two-factor linear model. The high correlations 
between the two dimensions indicate that, while distinct, the two major 
categories of items are ^not very far from being considered functionally 
homogeneous. Afi a consequence of this functional homogeneity-, the three- 
parameter logistic model fits the verbal data well^ and the results of IRT 
and linear equating are to a large degree siiailiar. 

In contrast, the quantitative scale does not seem to be fit as well 
*by either the nonlinear three-parameter logistic model nor a linear 
model. As a consequence, the linear common factor model does not describe 
quantitative data asN<ell as it d^s verbal data and is, therefore, less 
useful as a tool for acb^rately asaessing the dimensionality of the 
quantitative items. In other words, tli& -q^yant itative scale may be composed 
of heterogeneous items that are influenced bjr\multiple dimensions that can 
not be adequately described by the linear comoion factor model, f^pirical 
^ evidence for ,this hypothesis exists in the relative efficiency curves for 

thtt^quantltative subtests and the observed correlations between the 
'different quantitative item types. The former demonstrate a relative lack 
of statistical parallelism, while the latter demonstrate that data inter- 
pretation items share relatively little in *common with other quantitative 
items. 



13J 



lib 



»The three-parameter logistic model does not fit analytical items 
as well as it fits verbal items. The soon-to-be-replaced analysis of 
explanations item type is the major source of local independence violations. 
This item type is very susceptible to practice effects, which are problematic 
for the precalibration (lEP) method of IRT equating. In addition, these 
items exhibit instances of nonmonotonic empirical item-ability regressions, 
when the keyed response is option B or E. Due to the planned major 
overhaul of the analytic section, this research did not focus on this 
section. The analytical section was examined closely enough to confirm 
the wisdom of the decision to remove the analysis of explanations item 
type. More complete evidence for the wisdom of this decision is contained 
in Kingston and Dorans, 19tt2. 

Applicabil ity of item respnonse tffeory equating methods . Ttxe aspect 
of this research with the most direct bottom-line implications is the 
equating comparisons. Due to test disclosure legislation, the current 
linear method may no longer be a feasible equating procedure. A replace- 
ment or supplement should be found. Item response theory equating is 
particularly desirable because of other powerful statistical tools it 
provides in Addition' to equating. Lord (198Ua) describes several of these 
powerful tools that item response theory can supply to the testiag world. 
In this research, six different variants of item response theory true 
score equating were examined. Of these Six approaches, the precalibration 
(lEP) method holds the most promise ^for coping with the constraints 
imposed by test disclosure legislation. Unfortunately, it is the IKT 
, method niost susceptible to practice effects, as witnessed in the analytical 
equatings of form 3CGR1. The other sections of the Aptitude Test do not 
show this practice effect, but a subtle et;fect that causes a systematic 
scale drift might exist. Consequently, the susceptibility of particular 
itifm types to practice effects determines, to a large extent, the 
feasibility of using the lEP method for equating. 

While a companion report describes practice effect in detail, a 
summary of these findings suffices for our purposes of assessing, the 
feasibility of using the lEP method of IRT equating on the GKE Aptitude 
Test. The discrete verbal item type is not susceptible to practice 
effects, llie reading comprehension item type shows evidence of a possible 
fatigue effect. While tbe analysis of explanations items are very suscep- 
tible, neither logical diagrams nor analytical reasoning items are very 
susceptible. None ot the quantitative item types appear to be susceptible 
to practice effect^. 

In sum, the item response theory model and the precalibration method 
of IRT equating are most applicable to verbal item types, less applicable 
to quantitative item types because of dimensionality problems with data 
interpretation items and instances of nonmonotonicity for quantitative 
comparisons items , and least applicable to the exist ing analy t ical item 
types because of the severe practice effects associated with the analysis 
of explanations item type and its other problems. Planned revisions of 
the analytical section, particularly the removal of the troublesome 
analysis of explanations item type» shoul^ enhance the fit and applicability 



119 



of the three-parameter model to the analytical scale* planned revisions 
to the verbal section are not expected to affect greatly the satisfactory 
fit of the model to verbal item types. It i^ unlikely that planned 
revisions will improve the appropriateness of IRT methods for the 
heterogeneous quantitative scale, A fuller understanding of the workings 
of this rather complex scale is needed. 



ERIC 



1 



REFERENCES 

Bejar, I. A procedure for investigating the unidimensionality of achieve- 
ment tests based on item parameter estimates. Journal of Educational 
Measurement , 1980> 17, 283-296.- 

Braun, H. & Holland, P. Observed score test equating: a mathematical 
analysis of some ETS eqi^ting procedures. In P. Holland (Ed.), 
Proceedingjs of the ETS Research Statistics Conference on Test Equating . 
New York: Academic Press, in press. 

Carroll, J. B. The effect of difficulty and chance success on correlations 
^ between items or between tests. Psychometr ika , 1945, U), 1-20. 

Conrad, L. , Trismen, D. , & Miller,. R. (Eds.), Graduate Record Examina - 
tions Technical Manual . Princeton, NJ: Educational Testing Service, 
1977. 

Cowell, W. ICC preequating in the TOEFL testing program . Paper presented 
at the meeting of the American Educational Research Association 
and the National Council on Measurement in Education, Sa,n;JlL»ncisco , 
April 11, 1979. ' . 

Dorans, N., The need for a couaoon' metric in item bias studies , U. S. 
Office of "Personnel Management Report TM79-20. Washington, iJ.C; 
U.S.lf^fice of Personnel Managelnent, ly'79,. 

•c^ ■ " ■ 

Dressel, P, L. Some remarks on the Kude r-Richardson reliability coefficient, 
Psychometrika , .1940, 5^, 30^-310. 

Dwyer, P. S. The determination of the factor loadings of a given test 
from the known factor loadings of other tests. Fsychome trika , 
1937, 2, 173-178. ' ^ 

Ferguson, G. A. The factorial interpretation of test difficulty. 
Psychometrika , 1941, 6, 323^329. 

Gibson, W. A. Three multivariate models: Factor analysis, latent 

structure analysis; and latent prof life analysis. Psychometrika , 

1959, 24, 229-252. ' 

4 . - 

Gibson, W. A. Nonlinear factors in two dimensions. Psychometrika , 

1960, _25, 381-392. 

Gourlay, N. Difficulty fdctors arising from the use of tetrachoric 

correlations in factor analysis* British Journal of Psychology , 
Statistical Section , 1951^ 4;, 65-73. ' 

Guilford, J. P. The difficulty of a test and its factor composition. 
Psychometrika, 1941, 6, 67-77. ' ' 



121 



Hambleton, R. Latent ability scales: interpretations and uses. In S. " " 
Mayo (Ed.), New Directions for Testing and Measurement: Interpreting 
Test Performance , no. 6. San Francisco: Jossey-Bass, 19bU. 

Hambleton, R. , 6 Cook, L. Latent trait model^ and their use in the 
analysis of educational test data. Journal of Educational 
Measurement , 1977, 75-96. 

Hambleton, R. , Swaminathan, H. , Cook, L. , Eignor, D. , 6 Gifford, J. 

Developments in latent trait theory: Models, technical issues, and 
applications. Review of Educational Research , 1976, 4^, 467-51U. 

Harman, H. Modern factor analysis (3rd edition). Chicago: University 
of Chicago Press, 1976. 

Jaeger, R. M. Some exploratory indices for selection of a test equalling 
method. Journal of Educational Measurement , 19bl, 23-3b. 

Jennrich, R. I. & Sampson, p. F. Rotation for simple loadings. 

Psychometrika , 1966, _3i» 313-323. ^ 

Joreskog, K. G. Structural analysis of covariance and correlation 
matrices. Psychometrika , 1978, 443-477. 

it 

Kaiser, H. F. The varimax criterion for analytical rotation in factor 
analysis. Psychometrika , 195S, _23, 187-200. 

Kingston, N. M. and Dorans, N. J. "l^hfe effect of the position of an item 

within a test on item responding behavior : An analysis based on item 
response theory . Draft report, 1982. 

Lord, F. A survey of equating methods based on item characteristic / 

curve theory . Research Bulletin 75-13. Princeton, NJ: Educational 
Testing Service, 1975a. 

Lord , F . Evaluation with artificial data of a procedure for estimating 

ability and item characteristic curve parameters . Research Bulletin 
75-33. Princeton, N. J.: Educational Testing Service, iy75b. 

Lord, F. Practical applications of item characteristic curve theory. 
Journal of Educational Measurement , 1977, J7f, 117-138. 

Lord, F. Applications of item response theory to practical testing 

problems . Hillsdale, N.J.: Lawrence Erlbaum Associates, 1980a. 

Lord, F. , Personal communication, 1^80b. 

Lord, F. , Personal communication, 1981. 

Marco, G. Item characteristic curve solutions to three intractable 

testing problems. Journal of Educational Measurement , 1977, 14, 
139-160. ^ ^ ^ — 



137 



122 



Marco, G., Petersen, N., & Stewart, E. t, A test of the adequacy 

of curvilinear equating methods . Paper presented at the 1979 Computer 
ized. Adaptive Testing Conference, Minneapolis, June 2b, 1979. 

McDonald, R. P. Nonlinear factor analysis. Psychometric Monographs , 
1967, No. 15. 

Morris, C. On the foundations of test equating. In P. Holland (Ed.), 

Proceedings of the ETS Research Statistics Conference on Test Equating 
New York: Academic Press, in press. 

Mosteller, F., & Tukey, J.' Data analysis and regression . Reading, 
Mass.: Addison-Wesley Publishing Company, 1^^977. 

Petersen, N., Cook, L. , & Stocking, M. IRT versus conventional equating 

methods: A comparative study of scale stability . Paper presented at 
the meeting of the American Educational Research Association, Los 
Angeles, April 14, 1981. 

Petersen, N. , Marco, G. , & Stewart, E. E. A test of the adequacy of 

linear score equating models. In P. Holland (Ed.), Proceedings of 
the ETS Research Statistics Conference on Test Equating . New York: 
Academic Press, in press. 

Potthoff, R. Some issues in test equating. In P. Holland (Ed.), 

Proceedings of the ETS Research Statistics Conference on T6st Equating 
New York: Academic Press, in press. 

Powers, D. E. , Swinton, S. S., in Carlson, A. B. A factor analytic study 
of the GRE Aptitude Test . GRE Board Professional Report, GREB 
No. 75-llP. Princeton, N.J.: Educational Testing Service, 1977. 

Powers, D. E. , Swi^nton, S. S., Thayer, D., & Yates, A. A factor analytic 
study of seven experimental analytical item types . GRE Board 
Professional Report 77-7ti . Princeton, NJ: Educational Testing 
Service, 1978. 

Rasch, G. Probabilistic models for some intelligence and attainment 

tests . Copenhagen: Nielson and Lydicke (for Denmarks Paedagogiske 
Institut), 196U. 

Reckase, M. D. Unif actor latent trait models applied to multif actor 

tests: Results and implications. Journal of Educational Statistics , 
1979, _4, 207-230. 

Rock, D. , Werts, C. , t» Grandy, J. Construct validity of the GRE across 
populations - an empirical confirmatory study . Draft report, 1980. 

Stewart, E. E. Equating the Graduate Record Examinations Aptitude Test in 
the 1980 *s . Paper submitted to GRE Board Research Committee, Aprti 
1981. 



13 



123 



Stocking, , M. Personal communication, 1980. 
Swinton, S. S. Personal communication, 1980. 

l€winton, S. S., and Powers, D. E. A factor anlaytic study of the 
^ restructured Aptitude Test . GRE Boatd Professional Report 77-6P. 
Princeton, N.J.: Educational Testing Service, 1980. 

Thur stone, L, L. Multiple common factor analysis * Chicago: University 
of Chicago Press, 1947. 

Tucker, L. R. , Koopman, R. F. , & Linn, R. L. Evaluation of factor 
analytic research procedures by means of simulated correlation 
matrices. Psychometrika , 1969, 34, 421-459. 

\ 

V/arm, T, A primer of it^em response theory (CG-941278). Oklahoma City: 
U.S. Coast Guard Institute, December 1978. (NTIS Mo. AD-A0630). 

Wherry, R. , and Gaylord,* R. Factor pattern of test items and tests 

as a function of the c«^rrelation coefficient:' Content, difficulty, 
and constant error factors. Psychometrika , 19A4, 237-244. 

Wood, R. L. , Wingersky, M. , & Lord, F. LOGIST: A computer program 
for estimating examinee ability and item characteristic curve 
parameters . ETS Research Memorandum 76-6 (modified 1/78). Princeton, 
N.J.: Educational Testing Service, 1978. 

Wright, B. Solving measurement problems with the Rasch model. Journal 
of Educational Measurement , 1977, 14, 97-116. 

Yates, A. An oblique transformation method for primary factor pattern 
simplification which permits factorial complexity in exploratory 
analyses . Paper presented at the meeting of the Psychometric 
Society, Palo Alto, 1974i 



13 J 



Appendix A 

Score Conversion Tables for Various 
Equatings of the Verbal, Quantitative 
and Analytical Sections of Forms ZGRl 
K-ZGR2, K-ZGR3, and 3CGR1 



14 



Table A. 1 

Score Conversion Table for Verbal Scale of 
Form ZGRl (2/80) 



125 



RAW^ORE 

80.00 
79.00 
78.00 
77.00 
76.00 
75.00 
7^.00 
73.00 
72.00 
71.00 
70.00 
69.00 
6B.00 
67.00 
66.0.0 
65.00 
6^.00 
63.00 
62.00 
61.00 
60.00 

59. oa 

5 8.00 
57.00 
56.00 
55.00 
5^.00 
53.00 
52.00 
51.00 
50.00 
^9.00 
^8.00 
47.00 
46.00 
45. 00 
44.00 
43.00 
42.00 
41.00 
40.00 
39-. 00 
38.00 
3 7.00 
36.00 
35.00 
34.00 
3?.00 
3'2.00 
31.00 
30.00 
29.00 



FREO Ut 



0.0 
3.00 
7.00 
2.00 
9.00 
10.00 
17.00 
14.00 
12.00 
26.00 
34.00 
39.00 
40.00 
24.00 
54.00 
55.00 
70.00 
63.00 
52.00 
78.00 
72.00 
88.00 
88.00 
86.00 
, 95.00 
103.00 
107.00 
129.00 
122.00 
143.00 
132.00 
140.00 
129.00 
1 78.00 
177.00 
151.00 
162.00 
173.00 
189.00 
1 74.00 
207.00 
158.00 
196.00 
177.00 
194.00 
204.00 
2 17.00 
222.00 
192.00 
190.00 
199.00 
192.00 



846.11 
838.49 
830.64 
822.48 
814.23 
806.04 
797.93 
789.92 
781.98 
774.12 
766.31 
75 8.54 
750.80 
743.09 
735.40 
727.73 
720.07 
712.42 
704.78 
697. 14 
689.50 
681 .116 
674.22 
666.57 
658.93 
651.27 
643.61 
635.95 
628.28 
620.60 
612.92 
605.22 
597. 5Z 
589.81 
582.09 
574.36 
566^63 
558.^7 
551.11 
543.33 
535.54 
527.73 
M9.91 
512.07 
504.21 
496.34 
488.44 
4Q0.53 
472.61 
464.66 
456.69 
448.72 



UtH 

846.1 1 
838.58 
831.07 
823.35 
815.55 
807.72 
1799.92 
792.14 
784.40 
776.67 
768.96 
761.26 
- 753.55 
f45.85 
738.14 
730.42 
722.69 
714.95 
707.20 
699.44 
691.66 
683.88 
676.08 
668.27 
660.46 
652.63 
644.80 
636.96 
629.12 
621.26 
613.41 
605.54 
597.68 
589.81 
581.93 
574.05 
566.17 
558.29 
550.41 
542.52 
534.63 
52 6.74 
518.65 
510.95 
503.05 
495.15 
487.26 
479.36 
471.46 
463.57 
455.68 
447.79 



lALV 

846.11 
^38.57 
831.00 
823.13 
815. 10 
807.02 
798.96 
790.93 
782.94 
774.99 
767.08 
759.19 
751.32 
743.47 
735.62 
727.77 
719.92 
712.08 
704.23 
696.38 
688.54 
680.69 
672.86 
665.0 3 
657.21 
649.41 
641.62 
633.85 
626.11 
618.39 
610.69 
603.02 
595.38 
587.75 
580. 15 
572.58 
565.01 
557.47 
549.94 
542.42 
534.90 
^S27.39 
519.88 
512.37 
504.85 
497.33 
489.79 
482.25 
474.70 
467.14 
459.57 
451.98 



tA 

846 < 
838< 
830< 
822< 
814. 
807. 
799, 
791, 
783, 
776, 
768, 
760, 
752, 
744, 
737, 
729, 
721. 
713, 
705, 
698, 
690, 
682, 
6 74, 
667, 
659, 
651, 
643. 
635, 
628, 
620, 
612, 
604, 
596, 
589, 
581, 
573, 
565, 
557, 
550. 
542, 
534. 
526, 
519. 
511 . 
503. 
495, 
487, 
480, 
472, 
464, 
456, 
448, 



12 
33 
54 
76 
97 
18 
39 
61 
82 
03 
24 
46 
67 
88 
10 
31 
52 
73 
95 
16 
37 
58 
80 
01 
22 
43 
65 
86 
07 

?» 
50 
71 
92 
13 
35 
56 
77 
98 
20 
41 
62 
84 
05 
26 
47 
69 
90 
11 
32 
54 
75 
96 



{ 



ERIC 



14i 



Table A.l continued 



Score Conversion Table for Verbal Scale of 
Form ZGia (2/80) 



com 00 


1 AO t\f\ 


C fm 00 


1 "VV AA 

1 73.00 


•m X A A 

Z6« 00 


lor. UO 




t TA AA 
1 rU. UU 


00 


Id A A 

1 93. 00 


Z3« 00 


% £.» A A 


22m 00 


1 C T A A 

1 ^ r . 00 


2 1« 00 


1 ^ ^ A A 

136. 00 


20« 00 


• V A A A 

1 30.00 


1 9« 00 


121. 00 


I6«00 


113. 00 


1 7« 00 


A V A A 

93. 00 


I6«00 


1 07.00 


1 5« 00 


96. 00 


I4« 00 


1 09. 00 


1 3« 00' 


A 9 A A 


12. 00 


99.00 


1 1 • 00 


67. 00 


I0« 00 


66.00 


9m 00 


00 


A. 00 


^2. 00 


7.00 


46«00 


hm 00 


C 1 AA 
91 . 00 


5.00 


38.00 


^•00 


50.00 


3^00 


38.00 


2.00 


21.00 


I. 00 


37.00 


0.0 


28.00 


-I. 00 


IS. 00 


-2. 00 


19.00 


-3.00 


7.00 


-^.00 


9.00 


-5.00 


3.00 


-6.00 


4.00 


-7.00 


1.00 


-8.00 


1.00 


" -9. 00 


I. 00 


* I 0. 00 


0.0 


-11.00 


. 0.9 


-12.00 


1.00 



440.72 439.90 444.39 441.17 

432.71 432.02 436.78 433.39 

424.69 424.14 429.17 425.60 

416.65 416.27 421.54 417.81 
408.60 408.40 413.90 410.02 
400.54 400w54 406.25 402.24 
392.46 392.69 398.60 394.45 
384.38 384.84 390.93 386.66 
376.29 377.00 383.24 378.87 
368.19 369.16 375.55 371.09 
360.09 361.33 367.84 363.10 
351.98 353.51 360.12 355.51 
343.87 345.70 352.38 347.73 
335.77 337.89 344.63 339.94 

327.66 330.10 336.86 332.15 
319^57 322.31 329.06 324.36 
311.49 314.54 l^L-^* 316.58 
303.42 306.78 ITJT^I 308.79 
295.38 299.04 305.54 301.00 

287.36 291.32 297.64 293.21 

279.37 283.61 289.70 285.43 

271.41 275.91 281.73 277.64 
263.49 268.23 273.73 269.85 
255.63 260.56 265.69 262.06 
247.81 252.89 257.61 254.28 
240.04 245.23 249.51 246.49 
232.33 237.55 241.39 238.70 

224.67 229.86 233.25 230.91 
217.06 222.13 225.08 223.13 
209.45 214.35 216.86 215.34 

201.42 206.42 '208.45 207.55 
193.76 198.35 200.13 199.76 
186.11 190.60 192.36 191.98 
178.45 182.86 184.60 184.19 
170.80 175.11 176.84 176.40 
163.14 167.37 169.07 168.61 
155.49 159.62 161.31 160.83 
147.83 151.88 153.55 153^04 
140.18 144.13 145.79 145.25 
132.52 136.39 138.02 137.47 
1:^4.86 128.64 130.26 129.68 



14 



Table A.2 



Score Conversion Table for Verbal Scale of 
Form K-ZGR2 



BAW SCORE FKEO 



80.00 
79.00 
78.00 
77.00 
76.00 
75.00 
74.00 
'73.00 
72.00 
7i.00 
70.^^00 
69.00 
69.00 
67.00 
66. 00 
65.09 
64.00 
63.00 
62.00 
61.00 
60.00 
59.00 
58.00 
57.00 
56.00 
5^.00 
54.00 
53.00 
52.00 
51.00 
50.00 
49.00 
48.00 
47.00 
46.00 
45.00 
44.00 
43.00 
42.00 
41.00 
40.00 
39.00 
3^.00 
3 7.00 
36.00 
3 5.00 
34.00 
33.00 

a2.oo 

31.00 
30.00 
29.00 



0.0 
1.00 
5.00 
5.00 
7.00 
14.00 
25.00 
18.00 
19.00 
26.00 
22.00 
33.00 
39.00 
38.00 
41.00 
51.00 
53.00 
64.00 
61.00 
62.00 
79.00 
106.00 
102.00 
93.00 
101.00 
113.00 
135.00 
132.00 
140.00 
l?fl.00 
146.00 
140.00 
150.00 
171.00 
158.00 
193.00 
190.00 
, 172.00 
197.00 
201.00 
179.00 
187.00 
212.00 
220.00 
202.00 
187.00 
215.00 
209.00 
204.00 
181.00 
175.00 
221.00 



UL 

846.11 

839.75 

830.95 

821.92 

812.17 

803.06 

794.21 

785.60 

777. 16 

768.89 

760.74 

752.69 

744.72 

736.82 

728.97 

721.16 

713.37 

705.60 

697.83 

690.06 

682.28 

674.49 

666.68 

658.85 

651.00 

643. 13 

635.2^ 

627.91 

619.37 

611.41 

603.43 

595.43 

587.43 

579,41 

571.39 

563.36 

555.33 

547.30 

539.27 

531.24 

523.21 

515. 18 

507.16 

499.14 

491.13 

483*12 

475.12 

467.13 

459.15 

451.18 

443.22 

435.27 



lAW 

846. II 
840.24 
831.81 
823.13 
814.72 
806.62 
798.75 
791.04 
783.45 
775.92 
768.43 
760.96 
753.47 
745.96 
738.41 
730.83 
723.20 
715.53 
707.80 
700.02 
692. IB 
684.28 
676.33 
668.33 
660.27 
652.16 
644.01 
635.81 
627.55 
619.30 
611.03 
602.73 
594.42 
986.11 
977.79 
569.49 
561. 19 
552.92 
544.67 
536.45 
528.26 
520. 10 
511.98 
503.90 
495.86 
487.83 
479.85 
471.91 
463.99 
456.10 
448.24 
440.41 



lALH 

846.11 
839.09 
829.76 
819.87 

810.16 
800.80 
791.77 
783.04 
774.57 
766.30 
758.19 
750.23 
742.39 
734.63 
726.94 
719.31 
71 1.72 
704.15 
696.60 
689.05 
681.49 
673.92 
666.34 
658.72 
651.08 
643.40 
635.68 
627.93 
620.14 
612.31 
1604.44 
596.54 
988.60 
980.64 
572.66 
564.66 
556.65 
548.63 
540.60 
532.57 
924.56 
516.55 
508.56 
500.59 
492.65 
484.73 
476.84 
468.98 
441.15 
453.35 
449.98 
437.84 



lALV 

846 . 11 

839.07 

829.67 

819.55 

809.94 

799.86 

790.93 

781.54 

772.83 

764.35 

756.06 

747.93 

739.94 

732.05 

724.24 

716.50 

708.81 

70fW5 

693.5? 

685.91 

678.30 

670.69 

663.09 

655.48 

647.86 

640.23 

632.59 

624.94 

617.28 

609.62 

601.94 

594.27 

586.99 

578.91 

571.24 

963.57 

555.90 

948.24 

540.59 

532.95 

525.32 

517.70 

510.10 

502.91 

494.94 

487.38 

479.65 

472.33 

464.82 

457.34 

449.64 

442.40 



LA 

846.62 

838.69 

830.76 

822.83 

814.91 

806.98 

799.05 

791.12 

783.19 

775.27 

767.34 

759.41 

751.48 

743.55 

735.62 

727.70 

719.77 

711.84 

703.91 

695.98 

688.06 

680.13 

672.20 

664.27 

656. 34 

648.42 

640.49 

632.56 

624.63 

616.70 

608.78 

600.85 

592.92 

984.99 

977.06 

569.13 

561.21 

953.28 

549.35 

537.42 

529.49 

521.57 

513.64 

505.71 

497.78 

489.89 

481.93 

474.00 

466.07 

458. 14 

450.21 

442.28 



143 



128 



Table A*2 continued 

Score Conversion TeSl^ for Verbal. Scale of 
Form K-ZGR2 



28.00 171.00 427.34 432.60 430.12 4l4.95 434.36 

27.00 190.00 419.42 424.$l 422*43 427.51 426.43 

26.00 166.00 41 I. S3 417.06 414.76 420.07 41S.50 

25.00 191.00 403.66 409.35 407.11 412.64 410.57 

24.00 169.00 395.83 401.67 399.48 405.22 402.64 

23.00 134.00 388.02 394.04 391.88 397.80 394.72 

22.00 128.00 380.26 386.46 384.29 390.39 3«6.79 

21.00 128.00 372.54 378.92 376.73 382.98 378.86 

20. 00 126.00 364.86 371.45 369.19 375.58 370.93 

19.00 106.00 357.24 364.04 361.60 368. 19 363.00 

18.00 116.00 349.68 356.69 354.21 360.81 355.08 

17.00 119.00 342.17 349.40 346-77 353.45 347.15 

16.00 98.00 334.73 342.17 339.37 346.10 339.22 

15.00 80.00 327.35 335.00 332.03' 338.79 331.29 

14.00 71.00 320.03 327.87 324.74 331.49 323.36 

13.00 69.00 312.78 320.77 317.50 324.23 315.44 

12.00 76.00 305.58 313.71 310.33 316.99 307.51 

11.00 46.00 298.44 306.65 303.22 309.79 299.58 

10.00 49.00 291.34 299.61 296.17 302.60 291.65 

9.00 53.00 284.29 292.56 289. 18 2<>5.45 283.72 

8.00 65.00 277.27 285.50 282.25 288.31 275.79 

7. 00 36. 00 2 70.28 278.43 275.3 8 281 . 17 267.87 

6.00 39.00 263.29 271.32 268.54 274.05 259.94 

5.00 43.00 ^.-Z56. 30 264.19 261.73 266.91 252.01 

4.00 26.00 249.30 257.01 254.93 259.76 244.08 

3.00 30.00 242.27 -.^49.79 248.12 252.57 2^6.15 

2.00 32.00 235. M j?42^52 241.29 245.35 228.23 

1.00 16. DO 228. 11* 235. 234.42 238.08 220.30 

0.0 19.00 220.98 227.89 227^50 230.76 212.37 

-1.00 13.00 213.83 220.51 220.54 *23.4l 204.44 

-2.00 19^f00 206.38 212.79 213.57 216.04 196.51 

-3.00 6.00 198.46 205.60 206.60 208.64 188.59 

-4.00 10.00 190.71 197.69 198.79 200.57 180.66 

-5.00 4.00 1»2.96 189.78 190.81 192.57 172.73 

-6.00 3.00 175.21 181.87 182.83 184.5 7 164.80 

-7.00 3.00 167.46 173,96 174.85 176.57 156.87 

-8.00 1 .00 159.71 166*05 166.8 7 168.5 7 148.95 

-9.00 0.0 151.96 158.14 158.89 160.57 141 .02 

-10.00 0.0 144.21 150.23 150.91 152.58 133.09 



I4i 



Table A. 3 continued 



Score Conversion Table for Verbal Scale of 
Form K-ZGR3 



28,00 447.00 422.18 419.70 421.47 420.44 425.58 422.77 425.97 

2 7.00 42 7.00 413.71 411.42 4 13.02 412.17 417.56 414.57 417.49 

26.00 426.00 405.27 403.19 404.59 403.95 409.57 406.04 409.00 

25.00 410.00 396.84 395.02 396.18 395.77 401.60 396.97 400.52 

24.00 390.00 388.45 386.92 387.80 387.63 393.65 387.93 392.04 

23.00 395.00 380.09 378.89 379.46 379.53 385.72 378.93 383.56 

22.00 280.00 371. T9 370.95 371.17 371.4$ 377.82 370.12 375.07 

21.00 283.00 363.55 363.10 362.95 363.48 369.96 362.04 366.59 

20.00 265.00 355. 3H 355.33 354.80 355.54 362. 13 353.84 358.11 

19.00 202.00 347.30 347.66 346.74 347.67 354.34 346.13 349.62 

18.00 214.00 339.32 340.09 338.77 339.89 346.61 338.80 341.14 

17.00 195.00 331.44 332.60 330.91 332.19 338.95 331.54 332.64 

16.00 168.00 323.6 7 325.22 32 3.16 324.59 331.35 324.45 324.17 

15.00 163.00 316.02 317.92 315.54 317.11 323.83 316.88 315.69 

U.OO 153.00 109.50 310.72 3 08.04 309.74 316.40 308.59 307.21 

13.00 143.00 301.10 303.60 300.66 302.50 309.06 300.62 298.72 

12.00 119. 00 293.83 296.58 293.42 295.39 301.81 293.34 290.24 

H.OO US. 00 296.68 289.63 286.29 288.41 294.66 285.68 281.76 

10.00 113.00 279.66 282.79 279.29 281.56 287.59 276.77 273.27 

9.00 97.00 272.74 276.03 272.40 274.83 280.61 269.07 264.79 

8.00 88.00 265.93 269.36 265.61 268.22 273.71 262.43 256.31 

7.00 5B.00 25«>.22 262.78 258.94 261.71 266.89 255.99 247.82 

6.00 68.00 252.62 256.28 252.36 255.29 260.14 249.63 239.34 

5.00 54.00 246.11 249.86 245.88 248.96 253.46 242.53 230.86 

4.00 48.00 239.70 24|.53 239.51 242.70 246.84 235.84 222.37 

3.00 53.00 233.41 237.30 233.25 236.52 240.30 228.60 213.89 

2.00 29.00 227.26 231.19 227.11 230.4? 233.84 221.85 205.41 

I. 00 30.00 221.29 225.27 221.20 224.41 227.50 214.87 196.92 

0.0 . 25.00 215.58 219.66 215.52 218.56 221.32 2 05.36 188.44 

-1 . 00 11.00 210.25 214.70 210.22 212.94 215.38 1^4.79 179.96 

-2.00 7.00 205.14 210.09 205.14 207.72 209.83 186.83 171.48 

-3.00 4.00 197.40 202.24 197.40 202.46 204.25 180.88 162.99 

-4.00 2.30 189.66 194.40 189.66 194.62 196.40 177.00 154.51 

-5.00 5.00 181.92 186.56 181.92 186.78 188.54 170.13 146.03 

-6.00 0.0 174. 18 179.72 I 74 . 1 8 178.95 180.68 161.31 137.54 

-7.00 2.00 166.44 170.88 166.44 171.11 172.82 140.01 129.06 

-8.00 0.0 158.70 163.04 158.70 163.27 164.96 140.01 120.58 

-9.0J 0.0 150.96 155.20 150.96 155.43 157.4 1 s. 140 . 0 1 112.69 

-10.00 0.0 143.22 147.36 143.22 147.59 149.25 140.01 103.61 



14 :j 




130 

Table A. 3 



Score Conversion Tab3.e for Verbal Scale of 
Form K-ZGR3 



lUW SCOU FREQ lAL lAH IAL2 .^j^ALH lALV EE LE 



•0.00 1*00 8^6.11 846.11 846.11 •46.11 846.11 845.61 867.10 

79.00 1 .00 898.99 840.06 898.89 839.65 899.69 841.4 7 858. 6 1^ 

78:00 19.00 891w44 899.06 891.96 839.19 893.09 892.47 850.19 

77.00 1.00 824.29 826.49 824.17 826.70 826.55 826.28 841.65 

76.00 15.00 817.30 820. 12 817. 13 820.16 81^.85 819.52 899.16 

75.00 28.00 810.33 819.78 810.12 813.42 812.91 808.96 824.68 

74.00 31 .00 803.30 807.99 803.05 ff06.4 7 805.79 801.68 816.20 

73.00 46.00 796.16 800.70 795.88 799.29 798.91 794.61 807.72 

72.00 21.00 788.90 799.86 788.58 791.90 790.67 788.87 799.29 

71.00 53.00 781.50 786.80 781.14 784.92 782.86 782.56 790.75 

7Q.00 60.00 773.97 7 79.50 779.57 776.57 774.88 774.25 782.27 

69.00 7a. 00 766.31 .771.99 765.87 768.68 766.79 766.57 773.78 

68.00 102.00 758.55 764.29 758.07 760.68 758.61 758.19 765.30 

67.00 84.00 750.69 756.42 750.18 752.62 750.97 750.66 756.82 

66.00 99.00 742.76 748.40 742.22 744.49 742.08 744.06 748.33 

65.00 111.00 734.76 740.25 794.19 736.93 739.78 797.08 739.85 

64.00 129.00 726.71 792.01 726.11 728.14 725.45 729.90 791.97 

63.00 141 .00 718.60 729.67 717.98 719.92 717.12 722.93 722.811 

62.00 157.00 710.46 715.26 709.81 711.69 708.78 714.57 7l4.%0 

61.00 181.00 702.27 706.79 701.60 703.44 700.44 705.89 705.92 

60.00 175.00 694.04 698.27 699.35 695.17 692.09 697.46 697.41 

59.00 199.00 685.77 689.69 685.07 686.88 683.72 689.95 688.95 

58.00 223.00 677.46 681.07 676.74 678.54 675.33 682.38 680.47 

57.00 193«00 669. U 672.40 iS68.97 670.18 666.94 675.40 671.98 

56.00 249.00 660.79 669*69 659.97 661.78 658.59 668.05 669.50 

55.00 259.00 652.30 654.94 651.59 659.34 650.11 659.82 655.02 

54. 00 329.00 6 43. 84 646. 15 649.06 644.85 641 .6 7 651.18 646.59 

59.00 314.00 695.95 637.93 694*55 696.32 633.22 641.90 638.05 

52.00 311.00 626.83 628.47 626.02 627.74 624.75 633.12 629.57 

51.00 336.00 618.28 619.61 617.47 619.12 616.28 624.40 621.08 

50.00 395.00 609.72 610.72 608.90 610.45 607.81 615. 19 . 612.60 

49.00 427.00 601.15 601.83 600.33 601.76 599.34 605.75 604.12 

48.00 399.00 592.57 592.92 591.74 593.09 590.87 596.97 595.64 

4 7.00 418.00 583.99 584.02 583. 16 584.28 582.42 588.73 587. 15 

46.00 470.00 575.41 575.13 574.58 575.51 573.97 580.06 578.67 

45.00 423.00 566.84 566.25 566.01 S66. 72 565.54 571.82 570.19 

44.00 494.00 558.28 557.38 557.45 557.93 557.13 563.49 561.70 

43.00 529.00 549.72 548.54 548.90 549, L4 548.73 554.31 553.22 

42.00 445.00 541.18 539.72 540.36 540.36 540.36 545.69 544.74 

4 1.00 504.00 532.64 5 30.93 531.83 531.5 8 532.00 537.64 536.25 

40.00 567.00 524.12 522. 17 523.31 522.82 523.6 7 529.02 527.77 

39.00 551 .00 515.60 513.44 514.80 514.09 515.36 520.4 1 519.29 

38.00 543.09 507.09 504.74 506.29 505.38 507.07 511.86 510.80 

37.00 525.00 498.59 496.08 497.80 496.71 498.80 503.28 502.32 

36.00 573.00 490.09 487.45 489.30 488.07 4^0.57 494.76 493.84 

35.00 596.00 481.59 478.86 480.82 479.46 482.35 485.47 485.35 

34.00 541.00,473.10 470.30 472.33 470.90 474.16 476.28 476.87 

33.00 582.00 464.60 461.77 463.85 462.38 466.00 467.14 468.39 

32.00 519.00 456. 11 45 3.77 455.36 453.91 457.87 457.90 459.90 

31.00 528.00 447.6? 444.82 446.89 445.47 449.76 448.76 451.42 

30.00 476.00 439.14 436.40 438.41 437.09 441.68 439.63 442.94 

29.00 447.00 430.65 428.02 429.94 428.74 433.62 431.00 434.45 



14 ] 



ERIC 



131 

Table A. 4 



Score Conversion Table for Verbal' Scale of 
^ Form 3CGR1 



HAM SCOKES 



FKEO 



lEP 



lES 



lESH 



lESV 



EE 



LE 



75.00 
74.00 
73.00 
72.00 
71.00 
70.00 
69. 00 
68. 00 
67.00 
66.00 
65.00 
64.00 
63.00 
62.00 
M.OO 
60.00 
59.00 
58.00 
57.00 
56.00 
55. 00 
54.00 
53.00 
52.00 
51.00 
50.00 
49.00 
4 8.00 
^7.00 
46.00 
45.00 
44.00 
43.00 
42.00 
41.00 
40.00 
39.00 
31^.00 
37.00 
36.00 
35.00 
34. 00 
33.00 
32.00 
31.00 
30.00 
29.00 
28.00 
27.00 
26.00 
25.00 
24.00 



1.00 
9.00 
4.00 
2.00 
7.00 
11.00 
20.00 
28.00 
15.00 
21.00 
48.00 
49.00 
56.00 
37.00 
85.00 
79.00 
74.00 
100.00 
70.00 
127.00 
1 14.00 
131 .00 
148.00 
147.00 
1 70.00 
186.00 
189.00 
202.00 
215.00 
248.00 
209.00 
222.00 
273.00 
254.00 
284.00 
325.00 
304.00 
325.00 
294.00 
336.00 
337.00 
330.00 
355.00 
339.00 
345.00 
357.00 
359.00 
343.00 
352.00 
330.00 
364.00 
340.00 



846.11 
838.20 
8;^9.8 3 
821.72 
81^.80 
805.94 
798.05 
790.06 
781.95 
773.69 
765.29 
756.76 
748. 13 
739.42 
730.63 
721.81 
712.96 
704. 10 
695.24 
686.39 
677.55 
668.73 
659.93 
651.16 
642.42 
633. 70 
625.01 
616.35 
607.72 
599.13 
590.57 
582.04 
S73.55 
565.11 
556.70 
548.34 

540. oa 

531. 75 
523.53 
515. 36 
507.23 
499. 14 
491.10 
483. 10 
475.13 
467.20 
459.29 
451.41 
443.55 
435.70 
4?7.85 
420.01 



846.1 1 
838.90 
829.68 
820.4JB 
811.53 
802.76 
794.07 
785.43 
776.78 
768.13 
759.4 7 
750.82 
742.19 
733.59 
725.02 
716.48 
707. 9B 
699.51 
691.08 
682.67 
6 74 • 2 8 
665.90 
657.54 
649. 19 
640.85 
632.51 
624.17 
615.83 
607.50 
599.17 
590.83 
582.51 
574.18 
565.87 
557.56 
S49.26 
540.98 
532.71 
524.46 
516.24 
508.04 
499.07 
491.72 
483.60 
475.51 
467.44 
459.39 
451.37 
44T.36 
435.37 
427. 39 
419.42 



846.11 
838.01 
820.^5 
818.87 
809.95 
801.30 
792.82 
784.42 
776.07 
76 7.7 3 
759.41 
751.10 
742.80 
734.52 
726.25 
718.01 
709.77 
701.55 
693.34 
685. 13 
676.92 
668. 70 
660.47 
652.22 
643.95 
635.65 
627.32 
618.96 
610.57 
602.15 
593^70 
585.22 
576.73 
568.21 
559. 6ej 
551.14 
542.61 
534.09 
525.57 
517.0^ 
508.62 
500.19 
491.80 
483.45 
475. 14 
466.87 
458.64 
450^44 
442.29 
434. 16 
426.07 
41 7.99 



846. 1 1 
837.99 
828.13 
818.53 
809.32 
800. 38 
791.62 
782.96 
774.37 
765.82 
757.30 
748.82 
740.36 
731.94 
723.54 
715.18 
706.84 
698.52 
690.23 
681.96 
6 73.70 
665.46 
657.22 
640.99 
640.77 
632.56 
624. 34 
616. 1 3 
6107.93 
599.7a 
,591.52 
583.33 
5 75.^14 
566.97 
558.80 
550.64 
542.51 
534.38 
526.28 
918.20 
510. 15 
502.13 
494. 13 
4^6. 16 
478.22 
470.30 
462.41 
454.54 
446.69 
438.86 
431.03 
423.21 



836.46 
829.66 
822.42 
818.69 
813.94 
807.83 
798.34 
785.78 
775.83 
769.98 
761.44 
752.67 
741.88 
734.23 
726.67 
717.85 
709. 86 
701. 10 
693.87 
686.32 
677.26 
667.91 
658.89 
649.56 
640.53 
631.67 
622.24 
613.54 
604.95 
595.39 
586.80 
579.41 
571.15 
562.81 
554.59 
545.57 
536.83 
528.89 
521.20 
513.28 
505.05 
496.97 
489.00 
481.21 
473.47 
465.50 
457.62 
449.65 
442.37 
434.91 
426.84 
418.98 



1.70 



829. 
82U 
813.69 
805.68 
797.67 
789.65 
781.64 
773.63 
765.62 
757.61 
749.59 
741.58 
733.57 
725.56 
717.55 
709.53 
701.52 
693.51 
685.50 
677.49 
669.47 
661.46 
653.45 
645.44 
637.43 
629.41 
62U40 
613.39 
605.38 
597.37 
589.35 
581.34 
573.33 
565.32 
557.31 
549.29 
541.28 
$33.27 
525.26 
517.25 
509.23 
501.22 
493.21 
485.20 
477.19 
469.17 
461. 16 
453.15 
445.14 
437.13 
429.11 
421.10 



ERIC 



147. 



Table A. 4 continued 

Score Conversion Table for Verbal Scale of 
' Form 3CGR1 



132 



23.00 
22.00 
21.00 
20.00 
19.00 
18.90 
17.00 
16.00 
15.00 
1^.00 
13.00 
12.00 
11.00 
10.00 
9.00 
8.00 
7.00 
6. 00 
5.00 
4.00 
3.00 
2.00 
1.00 
0.0 
-1.00 
-2.00 
-3.00 
-4.03 
-•>.00 
-6.00 
-7.00 
-B.OO 



303.00 
332.00 
267.00 
311.00 
289.00 
263. OQ 
242.00 
242^00 
226.00 
222.00 
194.00 
196.00 
172.00 
164.00 
1S9.00 
176.00 
1 54.00 
152.00 
153.00 
115.00 
107.00 
75.00 
99.00 
74.00 
47.00 
36.00 
14.00 
14.00 
7.00 
6.00 
^3.00 
2.00 



411.46 

403.51 

395.56 

387.62 

379.66 

371.75 

363.83 

355.92 

348.01 

340.10 

332.19 

324.27 

316.33 

308.'37 

300.36 

292.35 

284.28 

2 76.17 

268.02 

259.84 

251.65 

243.46 

2)5.34 

227.26 

219.22 

211.05 

202.11 

194.01 

185.90 

177.60 

169.70 

161.59 



409.94 
401.91 
393.89 
385.86 
,377.89 
369.90 
361.93 
353.97 
346.0? 
336.08 
330.15 
322.24 
314.33 
306;44 
296.55 
290.67 
282.79 
274.91 
267.04 
259.16 
251.30 
243.44 
235.62 
227.86 
220.18 
212.57 
204.31 
t96.01 
167.84 
179.66 
171.49 
163.32 



415.40 

407.58 

399.77 

391.94 

384.12 

376.28 

369.43 

360.57 

352.70 

344.82 

336.91 

3?8.99 

321.04 

3n.06 

305.04 

296.97 

288.86 

280.69 

272.48 

264.22 

255.93 

247.6? 

239.35 

231.14 

223.03 

214.97 

206.16 

197.79 

189. 50 

181.40 

173.21 

165.02 



411.91 

404.27 

396.35 

368.60 

380.51 

372.87 

365.93 

359.21 

352.25 

344. 7<j 

337.41 

330.52 

323.51 

315.66 

308.07 

300.57 

292.12 

283.28 

272.78 

263.04 

255.38 

248.85 

241.06 

229.66 

;219.98 

211.56 

12^3.03 

195.35 

186.32 

176.70 

164. 17 

152.72 



244.841 

236. 83\ 

228.81: 

220.80 \ 

212. t9 

204.78 

196.77 

188.75 

180.74 

172.73 

164.72 



ERIC 



14 



133 

Table A.5 

Score Conversion Table for Quantitative Scale of 
Form ZGRl (2/80) 



RAU SCORES 



f«€0 



I At 



LA 



55.00 
54.00 
53. 00 
52.00 
51.00 
50.00 
49.00 

4n.oo 

47.00 

46.00 

45.00 

44.00 

43.00 

42.00 

41 .00 

40.00 

39.00 

36.00 

3 7.00 

36.00 

3«>.00 

34. 00 

33.00' 

3?. 00 

31 .00 

30.00 

29. 00 

26.00 

27«00 

26. 00 

25.00 

24.00 

23.00 

22.00 

21. 00 

20.00 

19. 00 

16.00 

I 7.00 
16.00 
15.00 
14.00 
13.00 
12.00 

II vOO 
10.00 

9.00 
6.00 
7.00 

f^mOO 

5.00 
4.00 



3.00 
6.00 
5.00 
19.00 
27.00 
32.00 
47.00 
34.00 
72.00 
70.00 
1 09.00 
73.00 
1 05.00 
121.00 
147.00 
1 37.00 
I 70. op 
1 76.00 
202.00 
20?. OD 
214.00 
251 .00 
240.00 
281.00 
273.00 
319.00 
312.00 
30^.00 
331 .00 
341.00 
31 7.00 
304.00 
311.00 
26^.00 
270.00 
277.00 
273.00 
236.00 
207.00 
1 69.00 
1 50.00 
1 39.00 
1 09.00 
1 20.00 
1 06.00 
1 02.00 
68.00 
46.00 
46.00 
53.00 
55.00 
33.00 



6^3. 14 
670.44 
657.91 
645.56 
633.30 
620.96 
606.50 
79S.94 
763.29 
770.56 
7S7.66 
745.13 
732.42 
719.73 
737.07 
694.44 
681.65 
669.29 
656. 7r» 
644.27 
631.79 
6I9«32 
606.86 
594.40 
5^U93 
569.44 
5S6.94 
544.40 
551.63 
519.23 
506.57 
493. ^6 
461 «09 
468.24 
455.33 
442.36 
429.34 
416. 2d 
403.20 
390.14 
377.10 
364.1 1 
351.16 
338.31 
325.51 
312.75 
300.04 
267.37 
274.74 
262.14 
249.57 
237.06 



663.15 
670.49 
657.62 
645.16 
632.49 
619.63 
607. 16 
794.50 
761.63 
769.16 
756.50 
743.63 
731.17 
7 nr. 50 
705.64 
693.17 
660.51 
667.64 
655.1 6 
642.51 
629.64 
617.18 
604.51 
591 .85 
579.1 8 
566.52 
553.65 
541.19 
528.52 
515.65 
50 3.19 
490.5? 
477.66 
465.19 
452.53 
439.66 
427.20 
414.53 
401.67 
369.20 
376.53 
363.67 
351.20 
336.54 
325. 6» 
313.21 
300.54 
267.66 
275.21 
262.54 
2^9.66 
237.21 



er|c 



143 



Table A. 5 continued 

Score Conversion Tab^e for Quantitative Scale of 
' Form ZGRl (2/80) 



3.00 


32.00 


22^*60 


22^.55 


^•00 


37.00 


212.20 


211.88 


1.00 


2^.00 


199. B5 


199.22 


0«0 


33.00 


187.53 


186.55 


-1*00 


^ 8.00 


175.17 


173.89 


-2*00 


8.00 


162.71 


161.2? 


-3.00 


^.00 


150.01 


1^8.56 




' 2.00 


137.15 


135.89 




3.00 


12^.^^ 


123.22 


-6.00 


0.0 


111.7^ 


110.56 


-7.00 


1.00 


. 99.03 


97,89 


-8.00 


0.0 


86.32 


85.23 


-9*00 


0.0 


73.62 


72.56 


-lO.OO 


0.0 


60.91 


59.90 



135 



I 



Table A.6 " 

Score Conversion Table for Quantitative Scale of 

Form K~ZGR2 



RAW SCORES FREO lAL tA 

55.00 5.00 ' 883.1^ 867.30 

54.00 16.00 861.69 854.87 

53. Oa 6.00 845.47 842.45 

52.00 27.00 830.89 830.03 

51.00 36.00 816.77 817.60 

50.00 60.00 802.76 805. l« 

49.00 60.00 788.83 792.75 

48.00 48. pO 775.05 780.33 

47.00 B8.bo 761.50 767.91 

46.00 97.00 748.23 755.48 

45.00 115.00 735.24' 743.06 

44.00 74.00 722.52 730.64 

43.00 113.00 710.04 718.21 

42^00 136.00 697.75 705.79 

41.00 177.00 685.62 693.37 

40.00 142.00 673.60 680.94 

39.00 169.00 661. 6B 668.5? 

38.00 208.00 649.82 656.09 

37.00 220.00 638.04 643.67 

3O.00 ?I5.00 626.32 631.25 

35.00 216.00 614.70 618*82 

' 34.00 246.00 603.19 606.40 

33.00 268.00 591.81 593.98 

32.00 268.00 580.58 581.55 

31.00 271.00 569.49 569.13 

30.00 290.00 558.55 556.70 

29,00 297.00 547.74 544.28 

28.00 327.00 537.03 531.86 

27.00 277.00 526^39 519.43 

26.00 298.00 515.77 507.01. 

25.00 337.00 505.11 494.59 

* 24.00 308.00 494.36 482.16 

23,00 285.00 483.45 469.74 

22.00 258.00 472.33 457.31 

21.00 266.00 460.94 444.89 

20.00 260.00 449.23 432.47 

19.00 233.00 437.16 420.04 

. 18.00 216.00 424.70 407.62 

17.00 224.00 411.85 395.20 

16.00 197.00 398.62 382.77 

15.00 1^6.00 385.04 370.35 

14.00 148.00 371.16 357.93' 

13.00 127.00 357.03 345.50 

12.00 96.00 342.70 333.08 

11.00 95.00 328.22 320.65 

10.00 90.00 313.65 308.23 

9.00 66.00 299.03 295.81 

8.00 57.00 284.44 2B3.38 

7.00 66.00 269,97 270.96 

6.00 38.00 255.75 258.54 

5.00 44.00 241.93 246.11 

4.00 30.00 228.65 233.69 



15 i 



136 



Table A. 6 continued 

Score Conversion Table for Quantitative Scale of 

Form K-ZGR2 



i 



3«00 


35.00 


216.06 


221.26 


2.00 


29.00 


204.27 


208.84^ 


1.00 


17.00 


193.34 


196.42 


0.0 


11.00 


183.24 


183.99^ 
171.57^ 


-UOO 


^.00 


173.93 


-2,00 


>.00 


165.29 


159.15 


-3.00 


2.00 


157.09 


146.72 


-4.00 


3.00 


148.06 


134.30 


-5.00 


0.0 


134.24 


121.86 


-6«00 


l.OO 


121.08 


109.45 


-7.00 


0.0 


107.93 


9r.03 


-fl.OO 


0.0 


94.77 


84.60 


-9.00 


0.0 


81.61 


72.18 


- 1 0. 00 


0.0 


68.45 


59.76 




■4 




137 



Table A. 7 

Score Conversion Table for Quantitative Scale of 

Form K-ZGR3 



RAW SCORES FREO lAL IAL2 LE 

55.00 4.00 889.14 889.14 842.40 

54.00 6.00 848.18 849.67 890.81 

59.00 9.00 828.04 890.17 819.21 

52.00 10.00 811.64 814.26 807.61 

51.00 14.00 796.78 799.75 796.02 

50.00 25.00 782.79 785.94 784.42 

49.00 21.00 769.17 772.54 772.82 

48.00 29.00 755.99 759.42 761.29 

47.00 98.00 749. 14 746.5 7 749.69 

46.00 49.00 790.59 799.98 738.09 

45.00 42.00 718.99 721.64. 726.44 

44.00 52.00 706.94 709.55 714. 84 

49.00 4 6. 00 694.61 697.69 709 .24 . 

42.00 66.00 689. 10 686.05 691.65 

41.00 64.00 671.80 674.60 680.05 

40. 00 64.00 660.68 669.9 9 668.45 

99.00 87.00 649.72 6^2.20 656.86 

38. QO 109.00 69A.91 641.22 645.26 

97.00 9B.00 628.22 690.95 699.66 

96.00 104.00 617.66 619.61 622.07 

95.00 86.00 607.21 608*^97 610.47 

94.00 112.00 596.87 598.44 598.87 

99.00 124.00 586.69 568.01 587.28 

92.00 120.00 576.49 577.68 575.68 

91.00 1 10.00 566.42 567.4 2 564.08 

90.00 125.00 556.4? 557.22 552.49 

29.00 154.00 546.44 547.07 540.89 

26.00 150.00 536.47 596.91 ' 529.29 

2 7. 00 146. 00 526.46 526*^72 517.70 

26.00 160.00 516.38 S16.46 506. 10 

2 5.00 161 .00 506. 18 506.09 494.50 

24.00 152.00 495.83 495.55 482.91 

23.00 150.00 485.28 484.82 471.31 

22.00 145.00 474.50 473.85 459.71 

21.00 166.00 463.48 462.63 448.12 ' 

20.00 145.00 452.20 451.15 436.52 

19. 00 1 38.00 44 0.68 439.42 424.92 

18.0t> 146.00 428.94 427.48 413.33 

^ 17.00 142.00 417.04 415.38 401.73 

16.00 140.00 405.01 403.16 390.14 

15.00 120.00 392.93 390»90 378.54 

14.00 121.00 380.84 378.64 366.94 

13.00 92.00 368.78 366.42 355.35 

12.00 90.00 356.76 354.26. 343.75 

1 1. 00 88.00 344.79 342. 16' 3 32.15 

10.00 63.00 332.85 i30.lL 320.56 

9.00 73.00 320.91 318.08 308.96 

8. 00 63.00 308.92 301 .01 297.36 

7.pO 49.00 296.65 293.88 285.77 

6.00 44.00 264.65 281.64 274.17 

5.00 30.00 272.27 269.26 262.57 

4.00 . 21.00 259.72 256.73 250.98 



\ 



153 



138 



Table A, 7 continued 

Score Conversion Table for Quantitative Scale of 
' Form K-ZGR3 



3.00 
2.00 
1.00 
0.0 
-1. 00 
-2.00 
-3.00 
-4.00 
-5.0(J 
-6.00 
-7.00 
-8.00 
-9.00 
-10.00 



3U00 
15.00 
17.00 
9.00 
6.00 
3.00 
2.00 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 



246.97 
234.06 
221.07 
206.00 
194.88 
161.56 
167.76 
152.49 
137.05 
.123.97 
110.66 
97.79 
64.70 
71 .61 



244.07 
231.31 
216.51 
205.71 
192.9-3 
160.05 
166.72 
152.11 
137.05 
123.97 
110.66 
'97.79 
64.70 
71.61 



239.36 ^ 

227.76 

216.19 

204.59 

192.99 " 

161.40 

169. ao 

156.20 
146.61 
135.01 
123.41 
111.62 
100.22 
68.62 




Table A. 8 . 

Score Conversion Table for Quantitative Scale of 

Form 3CGR1 



KAW SCORE FREO lEP lES EE tE 

55«00 9.00 881.U 889.1^^ 877.82 897.06 

9^.00 21.00 859.86 861.92 8^9.13 825.7^ 

59.00 20.00 899.70 8^1.12 896.86 8U.^9 

5^. 00 98*00 820.6^ .822.5 7 82^.90 809. 1 1 

51.00 62.00 801.85 80^. 59^ 807.79 791.79 

•50.00 I09i00 789.51 787.07 785.65 780. «7 

^9.00 93.00 765.92 770.18 770.90 769.16 

^8.03 79.00 7*9^2? 75*.08 7-59.79 757. ft« 

47.00 122.00 799.66 798.80 745.87 746.52 

46,00 197.00 718.95 724.99 729.68 795.20 

45.00. 161.00 705.05 710.58 715.78 729.88 

44. 00 146.00 691. 84 697.4 5 704.01 ' 712.57 

49.00 183.00 679.21 684.85 . 691.87 701.25 

4 2.00 2 09.00 667.0 5 672.6 9 678.17 689.93 

41.00 224.00 655.90 660.90 665.49 678.61 

40.03 298.00 649.89 64|9.49 659.47 667.90 

99.00 299.00 692.79 698.25 642.51 655.*98 

98.00 e9 8.00 621.96"^ 627.9 9 632.4 9 >44.66 

9 7.00 252.00 61 1 . 9A 616.65 629.05 699.34 

>6. 00 9 05.00 601.04 606.2 1 619.28 62?. 0 9 

95.00 278.00 590.99 595*^99 603.76 610.71 

94.00 921.00 581.03 585.97 594.49 ' 599.99 

33.00 322.00 571.32 576.16 584.89 5dd.07 

32.00 354.00 561.78 566.52 575.16 576.75 

31.00 387.00 552.39 557.04 564.9^5 565.44 

30.00 419.00 543.12 547.69 554.57 554.12 

29.00 4 27^.00 533.94 538.4 3 544. 15 542.80 

28.00 424.00 524.80 529.23 533.96 531.48 

2^7.00 445'^Oa 515. 68„ 520.05 523.70 520*17 

26.00 4 49.00 506.54 510.85 513. 28 508. B5 

2 5. 00 4 87.00 497. 3 4 501.59 502.6 8 497.53 

24.00 506.(^0 4 88.0 3 492.22 492.13 4 86.21 

2 3.00 4 75.00 47Q.5 8 482.70 •,69 4 74.90 

22.00 441.00 468.95 473.00 471770 463.58 

21 . 00 4 56.0t) 459. 1 1 46 3.06 46l. 75 4 52.26 

. 20.00 463.00 449.02 452.87 451.47 \ 440.94 

19.00 460.00 '438.66 442.39 ,440.56 429.6? 

1 a. 00 4&6.00 428.01 431.6 0 429.21 4t8.3 1 

17.00 411.00 417.04 420.51 417.92 406.99 

16.00 4 50.00 405.76 409.1 1 40 5.68 395^6 7 

15.00 411.00 394.15 397.44 392.22 *384.35 

14.00 3 57.00 3 82.24 38*5.52 378.75 3 73.04. 

13.00 310.00 370.03 373.39 365.56 361.72 

12.00 3 04.00 357.54 361. 11 351.34 350.40 

11.00 263.00 344.79 348.70 335.89 339.08 

10.00 256.00 331.79 336.20 321.65 327.77 

9.00 229.00 318.55 323.6.3 306.89 316.45 

8.00 196.00 305.08 310.99 291. 77 305. 13 

- 7.00 159.00 291.39 298.2 8 276.45 293.81 

6.00 131.00 277.49 285.48 262.43 282.49 

5.00 10«.00 26 3.45 272.59 ?4 9.26 271^18 

4.00 84.00 249.34 259.61 235.41 259.86 



Table A. 8 continued 

Score Conversion Table for Quantitative Scale of 

Form 3CGR1 . 



^ 3.00 


77.00 


2.00 


.50.00 


1.00 


63.00 


0.0 


37.00 


-I. 00 


2^.00 


-2.00 


26.00 


-3.00 


12.00 


-^.00. 


3.00 


-5.00 


4.00 


-6.00 


2.00 


-7^00 


0.0 


-8.00 


0.0 


-9.00 


.0.0 


-10.00 ' 


0.0 



235.26 246.56 

221.35 233.51 

207.73 220.48 

194. 4S 207.54 

181.67 194.65 

169.28 181.64 

157..26 167.94 

146.65 151.89 

134.82 138.43 

121.86 125.^2 

106.90 112.02 

95.94 98.81 

82.98 85.61 

70.03 72.40 



221.81 248.54 

210.46 237.22 

198.16 225.91 

180i61 214.59 

167.13 203.27 

153.21 191,.95 

129.36 160.64 

115.19 169.32 

106.35 158.00 

91.05 146.68 

91.05 135.36 

91.05 124.09 

91.05 1|2.73 

91.05 101.41 



141 

T£d>le A»9 



Score Conversion Table for Analytical Scale of 
Form 3CGR1 



R4W SCOPE FREQ lEP I ES ' EE LE 

66.00 0.0 805.71 80^.71 797.55 813. <i8 

65.00 1. 00 790.62 793.48 797.55 80<i.8l 

64.00 7.00 777.32 782.63 773.61 796.14 

63.00 I. 00 764.85 772.58 769.43 787.48 

62.00 11.00 752«96 762.96 765.35 778.81 

61.00 17.00 741.55 753.65 758.86 770.14 

60.00 30.00 730.56 744.55 751.07 761.48 

59.00 68.00 719.96 735.62 735.78 752.81 

58.00 41.00 709.66 726.84 7?6.78 744.14* 

57.00 83.00 699.64 718.17 720«46 735.48 

56.00 76.00 689.83 709.61 714.39 726.81 

55.00 109.00 680.20 701.13 708.52 718.14 

54.00 170.00 670.72 692.71 700.01 709.48 

53.00 126.00 661.36 684.35 690.04 700.81 

52.00 .182.00 652.11 676.03 682.27 692.14 

51.00 214.00 642.94 667.75 674.70 6^3.48 

50.00 216.00 633. &4 659.49 667.49 674.81 

49.00 252.00 674.81 651.26 659.97 666.14 

4A,00 217.00 615.84 643.05 651.26 657.48 

47.00. 241.00 606.92 634.^6 643.18 64^.81 

^6.00 255.00 598.04 626.67 636.20 640.14 

^5.00 288.00 589.22 618.50 629.19 631.48 

^^.00 280.00 5B0.4^ 610.34 622.31 622.81 

^3.00 2B1.00 571.69 602.18 615.15 614.14 

42.00 290.00 563.00 594.0) 606.98 605.48 

41.00 306.00 554.34 585.8^ 598.75 596.81 

40.00 344.00 545.74 577.74 590.^5 588.14 

39.00 332.00 537.18 569.60 582.88 579.48 

38.00 305.00 528.66 561.46 575.52 570.81 

37.00 343.00 520.19 553.32 566.91 562.14 

36.00 371.00 511.77 545.17 556.98 553.48 

3^.00 352.00 503.40 537.02 548.13 544.81 

3^.00 380.00 495.07 528.06 539.40 536.14 

33.00 286.00 486.80 520.69 531.29 527.48 

32.00 341.00 478.57 512.51 522.98 51^.81 

31.00 351.00 470.39 504.32 513.70 510.14 

30.00 -^03. 00 462.26 496.12 505.50 501.48 

29.00 325.00 4S4.17 487.89 497.58 497.81 

78.00 115.00 446.13 479.65 489.31 484.14 

77.00 318.00 438.14 471.39 480.38 475.48 

.76«00 305.00 430.18 463.10 471.32 466.81 

25.00 295.00 422.27 454.79 467.79 458.14 

24.00 776.00 414.39 446.46 454.74 449. 4^ 

21.00 313.00 406.54 438.09 446.94 440.81 

22.00 292.00 398.72 429.69 438.11 432. |^ 

71.00 276.00 390.92 421.26 428.84 423. 4t' 

20.00 267.00 383.15 412.78 420.41 414.81 

19.00 259.00 375.38 404.26 412. dt) 406. L4 

18.00 242.00 367.63 395.70 403.61 397.48 

17.00 255.00 359.87 387.09 394.87 388.81 

16.00 262.00 352.11 378.43 385.29 380.14 

15.00 . 231.00 344.34 369.70 376.01 371.48 



157 



Table A. 9 continued 

Score Conversion Table for Analytical Scale of 

Form 3CGR1 



14.00 ?37.00 



1 1. f\f\ 

I 3. 00 


c .00 


12.-00 


226.00 


11.00 


?03.00 


10.00 


200.00 


9.00 


195.00 


8.00 


196.00 


7.00 


165.00 


6.00 


187. OQ 


5.00 


154.00 


4.00 


176.00 


3.00 


1 1 1.00 


2.00 


117.00 


1.00 


95.00 


0.0 


76.00 


-1.00 


58.00 


-2.00 


28. Ob 


-3.00 


25.00 


-4.00 


25.00 


-5.00 


10.00 


-6.00 


5.00 


-7.00 


0.0 


-8.00 


3.00 


-9.00 


3.00 



336.55.^ . 360.91 

328.73 3^2.06 

3^0. »7 343.12 

312.96 334.11 

304.99 325.01 

296.95 315.80 

288.81 306.49 

280.58 297.05 

272.22 287.47 

263.71 277.72 

255.02 267.77 
246.11 . 257.57 
236.95 247.06 

227.46 236.17 
217.58 224.80 
207.17 212.79 

196.03 199.91 
183.52 185.77 

173.47 174.79 
164. 71 165.96 
155.95 157.14 
147.19 148.32 
138.43 139^49 
129.66 130.67 



367.37 362.81 

958.96 354. 14 

349.49 345.48 

338.04 336.81 

327.79 328.14 

316.88 319.48 

305.37 310.81 

294.69 302.14 

283.85 293.48 

273.36 284.81 

261.71 276.14 

250.31 267.48 

240.23 258.81 

228.91 250.14 

217.85 241.48 

207.57 232.81 

199.04 224.14 

193.02 215.48 

184.30 206.81 

171.12 198.14 

163.76 189.48 

160.84 180.81 

159.09 172.14 

151.44 163.48 



u * 

I 



^ Appendix B 

Relative Efficiency Curves for Various 
Score Scales Produced by Different IRT 
Equating Methods on Forms 3CGR1, ZGRl, 
K-ZGR2, and K-ZGR3 



ERIC 



15j 



;t — = — ^ — = — = ^ ^ 



«••»•! UmM aMOT 



I ■ 

t 



X7 



ERIC 






o 

ERIC 





Mm«m» «Mtol Ml»«Ml«Mt ail**!** i« <*■ •m4«i toMMi « 







• 






\ 




Utt»4 •*••• 



ERIC 



IB 



' Mr .» 

IfliilMif •! ilw •MMiiMtlM ««<«tM •! tCIl <C«lfbfM*4 M r«**wrT tot*) / ' , .... . . « . ^. . .'^ < . . _ . 




Qm«<I<«II»« 1<«I«4 tnf 



\ 



\ 



ftmrn** t.t.h 

tM«4 t« flat* kf ItolM) ■•Utiw t* th* QMitlt*tf«» lAftlM •! 

fn-ra ^1 (C*llki*l*4 Ml Jm Mt*) 



Iffldwcy «f cJm «•«*•! to(ilM •! r«f« MiU (Utlwivtf r*ffMM*ra 




ERIC 



16j 



GR£ BOARD RESEARCH 
REPORTS" OF A TECHNICAL NATURE 



Boldt, R. R. Coaparlton of a %»S^«rtan An<l a 
Least Squares Method of Educational 

Prediction. GREB No. 70-^3P, June 
1975. 

Campbell, J. T. and Belcher, L. H. 
Word Associations of Students at 
Predominant ly White and Predominantly 
Black Colleges. GREB No. 71-6P, 
December 1975. 

Campbell, J. T. and Donlon, T. F. Relation- 
ship of the Figure Location Test t^b 
Choice of Graduate Major. GREB No. 
7 5-7P, November 1980. 

Carlson, A. B.; Reilly, R. R.; Mahoney, M. 
H.; and Casserly, P. L. The 

Development and Pilot Testing of 
Criterion Rating Scales. GREB No. 
7VIP, October 1976. 

Carlson, A. B.; Evans, F.R.; and Kuykendall, 
N. M. The Feasibility of Common 
Criterion Validity Studies of the GRE. 
GREB No. 71-lP, Julv 197^. 

Donlon, T. F. An Exploratory Study of the 
Implications of Test Speededness. 
GREB No. 76-9P, March 1980. 

Donlon, T. F.; Rellly, R. R.; and McKee, J. 
D. Development of a Test of Global 
vs. Articulated Thinking: The Figure 
Location Test. 'GREB No. 7a-9P, June 
1978. ^ 

Echternacht, G. Alternate Methods of 
Equating GRE Advanced Tests. GREB No. 
69-2P, June 197^. 

Echternacht, G. A Comparison of Various 
Item Option Weighting Schem<s/A 
Note on the Variances of Empirically 
Derived Option Scoring Weights. 
GREB^tlo. 71-17P, February 1975. 

Echternacht, G'. A Quick Method for- 
Determining Test. Bias. GREfe !<3o. 70-ftP, 
.Julv 197^. 

Evans, F. R. The GRE-Q Coaching/Instruction 
Studv. GREB No. 7 1-5aP, September 
1977. 

Frederlcksen, N. and Ward, W. C. Develop- 
ment of Measures for ^he Study of 
Creativity. GREB No. June 
1975. 

Levine, M. V. and Drasgow, F. Appropriate- 
ness Measuretpent with Aptitude Test 
Data and Esimated Parameters. 
GREB No. 75-3P, March 1^80, 

McPeek, M.; Altman, R. A.; Wallmark, M. ; and 

Wlngerskv, B. C. An Investigation of 
the Feasibility of Obtaining Additional 
Subscores on the GRE Advanced 
Psychology Test. GREB No. 7A-AP, April 
1976. 



Pike, L. vlmplicit Guessing Strategies of GRE 
Aptitude Examinees Classified by Ethnic 
Group and Sex. GREB No. 75-lOP, June 
1980. 

Powers, D. E.; Swinton, S.; Thayer, D. ; . 
and- Yates, A. > Factor Analytic 
Investigation of Seven Experimental 
Analytical Item Types. GREB. No. 
77-lP, June 1978. 

Powers, D. E.; Swinton, S. S.; and Carlson, 
A. B; .A Facfor Analytic Study ,of 
the. GRE Aptitude Test. G B No. 
75-llP, September 1977. \ 

Reilly, R. R. and Jackson, R. Effects, 
of Empirical Option We i gh t i n^^ on ^ 
Reliability and Validity of the GRE. 
GREB No. 71-9P, July 1974. 

Rellly,' R. R. Factors in^raduate Student 
Performance. GREB No. 71-2P, July 
1974. 

Rock, D. A.' The' Ident if icat Ion of 
Population Moderators and Their 
Effect on the Prediction of .Doctorate 
Attainment. GREB No. 69-6bP. February 
1975. 



Rock, D. A. The "Test Chooser": A Different 
Approach to a Prediction Weighting 
Scheme. GREB No. 70-2P, November 
1974. 

Sharon, A. T. Test of English as a Foreign 
Langaage as a Moderator of Graduate 

Reco-rd Examinations Scores in the 
Prediction of Foreign Students*^ Grades 
in Graduate School. GREB No. 70-lP, 
June 1974. 

Strieker, L. J. A (Jew Index of Differential 
Subgroup Performance: Application to 
the GRE Aptitude Test. GREB »No. 78-7P, 
June 198U 

Swinton, S. S. and Powers, D. E. A Factor 

Analytic Study of the Restructured 
GRE Aptitude Test. GREB No. 77-bP, 
February 1980. 

Ward, W. C. A Comparison of Free-Response 

and Multiple-Choice Forms oi. Verbal 
Aptitude Tests. GREB No. 79-8P, 
January 1982. 



Ward, W. C; Frederiksen, N.; and Carlaon, 

S. B.. Construct Validity of Free- 
Response and Machine-Scorable Verslona 
of a Test of Scientific Thinking. 
CREB No. 7A-8P, November 1978. 

Ward, W. C. and Rrederiksen, A Study of 

the Predictive Validity of the Testa 
of Scientific Thinking; GREB No. 
7A-6P, October 1977. 



